skip to main content
research-article

RD-IOD: Two-Level Residual-Distillation-Based Triple-Network for Incremental Object Detection

Authors Info & Claims
Published:27 January 2022Publication History
Skip Abstract Section

Abstract

As a basic component in multimedia applications, object detectors are generally trained on a fixed set of classes that are pre-defined. However, new object classes often emerge after the models are trained in practice. Modern object detectors based on Convolutional Neural Networks (CNN) suffer from catastrophic forgetting when fine-tuning on new classes without the original training data. Therefore, it is critical to improve the incremental learning capability on object detection. In this article, we propose a novel Residual-Distillation-based Incremental learning method on Object Detection (RD-IOD). Our approach rests on the creation of a triple-network based on Faster R-CNN. To enable continuous learning from new classes, we use the original model as well as a residual model to guide the learning of the incremental model on new classes while maintaining the previous learned knowledge. To better maintain the discrimination between the features of old and new classes, the residual model is jointly trained with the incremental model on new classes in the incremental learning procedure. In addition, a two-level distillation scheme is designed to guide the training process, which consists of (1) a general distillation for imitating the original model in feature space along with a residual distillation on the features in both image level and instance level, and (2) a joint classification distillation on the output layers. To well preserve the learned knowledge, we design a 2-threshold training strategy to guide the learning of a Region Proposal Network and a detection head. Extensive experiments conducted on VOC2007 and COCO demonstrate that the proposed method can effectively learn to incrementally detect objects of new classes, and the problem of catastrophic forgetting is mitigated. Our code is available at https://github.com/yangdb/RD-IOD.

REFERENCES

  1. [1] Aljundi Rahaf, Babiloni Francesca, Elhoseiny Mohamed, Rohrbach Marcus, and Tuytelaars Tinne. 2018. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision. 139154.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Aljundi Rahaf, Chakravarty Punarjay, and Tuytelaars Tinne. 2017. Expert gate: Lifelong learning with a network of experts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 33663375.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Arbeláez Pablo, Pont-Tuset Jordi, Barron Jonathan T., Marques Ferran, and Malik Jitendra. 2014. Multiscale combinatorial grouping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 328335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Cai Zhaowei and Vasconcelos Nuno. 2018. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 61546162.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Cauwenberghs Gert and Poggio Tomaso. 2001. Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems. 409415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Chen Li, Yu Chunyan, and Chen Lvcai. 2019. A new knowledge distillation for incremental object detection. In Proceedings of the 2019 International Joint Conference on Neural Networks. IEEE, Los Alamitos, CA, 17.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chen Yudi, Wang Wei, Zhou Yu, Yang Fei, Yang Dongbao, and Wang Weiping. 2021. Self-training for domain adaptive scene text detection. In Proceedings of the 2020 25th International Conference on Pattern Recognition. IEEE, Los Alamitos, CA, 850857. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Everingham Mark, Gool Luc Van, Williams Christopher K. I., Winn John, and Zisserman Andrew. 2010. The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision 88, 2 (2010), 303338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] French Robert M.. 1999. Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences 3, 4 (1999), 128135.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Girshick Ross. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 14401448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Goodfellow Ian J., Mirza Mehdi, Xiao Da, Courville Aaron, and Bengio Yoshua. 2013. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013).Google ScholarGoogle Scholar
  12. [12] Hao Yu, Fu Yanwei, and Jiang Yu-Gang. 2019. Take goods from shelves: A dataset for class-incremental object detection. In Proceedings of the 2019 International Conference on Multimedia Retrieval. 271278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Hao Yu, Fu Yanwei, Jiang Yu-Gang, and Tian Qi. 2019. An end-to-end architecture for class-incremental object detection with knowledge distillation. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo. IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Hinton Geoffrey, Vinyals Oriol, and Dean Jeff. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google ScholarGoogle Scholar
  16. [16] Hou Saihui, Pan Xinyu, Loy Chen Change, Wang Zilei, and Lin Dahua. 2019. Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 831839.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Jung Heechul, Ju Jeongwoo, Jung Minju, and Kim Junmo. 2016. Less-forgetting learning in deep neural networks. arXiv preprint arXiv:1607.00122 (2016).Google ScholarGoogle Scholar
  18. [18] Jung Heechul, Ju Jeongwoo, Jung Minju, and Kim Junmo. 2018. Less-forgetful learning for domain expansion in deep neural networks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 33583365. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Khosla Prannay, Teterwak Piotr, Wang Chen, Sarna Aaron, Tian Yonglong, Isola Phillip, Maschinot Aaron, Liu Ce, and Krishnan Dilip. 2020. Supervised contrastive learning. In Advances in Neural Information Processing Systems 33.Google ScholarGoogle Scholar
  20. [20] Kirkpatrick James, Pascanu Razvan, Rabinowitz Neil, Veness Joel, Desjardins Guillaume, Rusu Andrei A., Milan Kieran, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences 114, 13 (2017), 35213526.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 10971105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Kuzborskij Ilja, Orabona Francesco, and Caputo Barbara. 2013. From n to n+1: Multiclass transfer incremental learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 33583365. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Li Dawei, Tasci Serafettin, Ghosh Shalini, Zhu Jingwen, Zhang Junting, and Heck Larry. 2019. RILOD: Near real-time incremental learning for object detection at the edge. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing. 113126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Li Zhizhong and Hoiem Derek. 2017. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 12 (2017), 29352947.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Lin Tsung-Yi, Goyal Priya, Girshick Ross, He Kaiming, and Dollár Piotr. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 29802988.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. 740755.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Liu Chuanbin, Xie Hongtao, Zha Zhengjun, Yu Lingyun, Chen Zhineng, and Zhang Yongdong. 2019. Bidirectional attention-recognition model for fine-grained object classification. IEEE Transactions on Multimedia 22, 7 (2019), 17851795.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Liu Wei, Anguelov Dragomir, Erhan Dumitru, Szegedy Christian, Reed Scott, Fu Cheng-Yang, and Berg Alexander C.. 2016. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. 2137.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Luo Dezhao, Liu Chang, Zhou Yu, Yang Dongbao, Ma Can, Ye Qixiang, and Wang Weiping. 2020. Video cloze procedure for self-supervised spatio-temporal learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1170111708.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] McCloskey Michael and Cohen Neal J.. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation. Vol. 24. Elsevier, 109165.Google ScholarGoogle Scholar
  31. [31] Mensink Thomas, Verbeek Jakob, Perronnin Florent, and Csurka Gabriela. 2013. Distance-based image classification: Generalizing to new classes at near-zero cost. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 11 (2013), 26242637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Perez-Rua Juan-Manuel, Zhu Xiatian, Hospedales Timothy, and Xiang Tao. 2020. Incremental few-shot object detection. arXiv preprint arXiv:2003.04668 (2020).Google ScholarGoogle Scholar
  33. [33] Polikar Robi, Upda Lalita, Upda Satish S., and Honavar Vasant. 2001. Learn++: An incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man, and Cybernetics: Part C (Applications and Reviews) 31, 4 (2001), 497508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Qiao Zhi, Zhou Yu, Yang Dongbao, Zhou Yucan, and Wang Weiping. 2020. Seed: Semantics enhanced encoder-decoder framework for scene text recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1352813537.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Qin Xugong, Zhou Yu, Guo Youhui, Wu Dayan, and Wang Weiping. 2021. FC 2 RN: A fully convolutional corner refinement network for accurate multi-oriented scene text detection. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Los Alamitos, CA, 43504354.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Qin Xugong, Zhou Yu, Yang Dongbao, and Wang Weiping. 2019. Curved text detection in natural scene images with semi-and weakly-supervised learning. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR’19). IEEE, Los Alamitos, CA, 559564.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Rannen Amal, Aljundi Rahaf, Blaschko Matthew B., and Tuytelaars Tinne. 2017. Encoder based lifelong learning. In Proceedings of the IEEE International Conference on Computer Vision. 13201328.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Rebuffi Sylvestre-Alvise, Kolesnikov Alexander, Sperl Georg, and Lampert Christoph H.. 2017. iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20012010.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Ren Shaoqing, He Kaiming, Girshick Ross, and Sun Jian. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 9199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Shmelkov Konstantin, Schmid Cordelia, and Alahari Karteek. 2017. Incremental learning of object detectors without catastrophic forgetting. In Proceedings of the IEEE International Conference on Computer Vision. 34003409.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Sun Gan, Cong Yang, and Xu Xiaowei. 2018. Active lifelong learning with “Watchdog.” In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 41074114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Sun Gan, Yang Cong, Liu Ji, Liu Lianqing, Xu Xiaowei, and Yu Haibin. 2018. Lifelong metric learning. IEEE Transactions on Cybernetics 49, 8 (2018), 31683179.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Wang Yuxin, Xie Hongtao, Zha Zheng-Jun, Tian Youliang, Fu Zilong, and Zhang Yongdong. 2020. R-Net: A relationship network for efficient and accurate scene text detection. IEEE Transactions on Multimedia 23 (2020), 1316–1329.Google ScholarGoogle Scholar
  44. [44] Yao Yuan, Liu Chang, Luo Dezhao, Zhou Yu, and Ye Qixiang. 2020. Video playback rate perception for self-supervised spatio-temporal representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 65486557.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Zenke Friedemann, Poole Ben, and Ganguli Surya. 2017. Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine Learning—Volume 70. 39873995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Zhang Junting, Zhang Jie, Ghosh Shalini, Li Dawei, Tasci Serafettin, Heck Larry, Zhang Heming, and Kuo C.-C. Jay. 2020. Class-incremental learning via deep model consolidation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 11311140.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Zhang Yifei, Liu Chang, Zhou Yu, Wang Wei, Wang Weiping, and Ye Qixiang. 2021. Progressive cluster purification for unsupervised feature learning. In Proceedings of the 2020 25th International Conference on Pattern Recognition. 84768483.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Zhong Bineng, Bai Bing, Li Jun, Zhang Yulun, and Fu Yun. 2018. Hierarchical tracking by reinforcement learning-based searching and coarse-to-fine verifying. IEEE Transactions on Image Processing 28, 5 (2018), 23312341.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Zhou Qinqin, Zhong Bineng, Lan Xiangyuan, Sun Gan, Zhang Yulun, Zhang Baochang, and Ji Rongrong. 2020. Fine-grained spatial alignment model for person re-identification with focal triplet loss. IEEE Transactions on Image Processing 29 (2020), 75787589.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Zhou Xingyi, Wang Dequan, and Krähenbühl Philipp. 2019. Objects as points. arXiv preprint arXiv:1904.07850 (2019).Google ScholarGoogle Scholar
  51. [51] Zitnick C. Lawrence and Dollár Piotr. 2014. Edge boxes: Locating object proposals from edges. In Proceedings of the European Conference on Computer Vision. 391405.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Kang Bingyi, Liu Zhuang, Wang Xin, Yu Fisher, Feng Jiashi, and Darrell Trevor. 2019. Few-shot object detection via feature reweighting. In Proceedings of the IEEE International Conference on Computer Vision. 8420–8429.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Wang Xin, Huang Thomas E., Darrell Trevor, Gonzalez Joseph E., and Yu Fisher. 2020. Frustratingly simple few-shot object detection. arXiv preprint arXiv:2003.06957.Google ScholarGoogle Scholar
  54. [54] Zhang Dingwen, Tian Haibin, and Han Jungong. 2021. Few-cost salient object detection with adversarial-paced learning. arXiv preprint arXiv:2104.01928.Google ScholarGoogle Scholar

Index Terms

  1. RD-IOD: Two-Level Residual-Distillation-Based Triple-Network for Incremental Object Detection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 1
          January 2022
          517 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3505205
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 January 2022
          • Accepted: 1 June 2021
          • Revised: 1 May 2021
          • Received: 1 September 2020
          Published in tomm Volume 18, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!