Abstract
As a basic component in multimedia applications, object detectors are generally trained on a fixed set of classes that are pre-defined. However, new object classes often emerge after the models are trained in practice. Modern object detectors based on Convolutional Neural Networks (CNN) suffer from catastrophic forgetting when fine-tuning on new classes without the original training data. Therefore, it is critical to improve the incremental learning capability on object detection. In this article, we propose a novel Residual-Distillation-based Incremental learning method on Object Detection (RD-IOD). Our approach rests on the creation of a triple-network based on Faster R-CNN. To enable continuous learning from new classes, we use the original model as well as a residual model to guide the learning of the incremental model on new classes while maintaining the previous learned knowledge. To better maintain the discrimination between the features of old and new classes, the residual model is jointly trained with the incremental model on new classes in the incremental learning procedure. In addition, a two-level distillation scheme is designed to guide the training process, which consists of (1) a general distillation for imitating the original model in feature space along with a residual distillation on the features in both image level and instance level, and (2) a joint classification distillation on the output layers. To well preserve the learned knowledge, we design a 2-threshold training strategy to guide the learning of a Region Proposal Network and a detection head. Extensive experiments conducted on VOC2007 and COCO demonstrate that the proposed method can effectively learn to incrementally detect objects of new classes, and the problem of catastrophic forgetting is mitigated. Our code is available at https://github.com/yangdb/RD-IOD.
- [1] . 2018. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision. 139–154.Google Scholar
Cross Ref
- [2] . 2017. Expert gate: Lifelong learning with a network of experts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3366–3375.Google Scholar
Cross Ref
- [3] . 2014. Multiscale combinatorial grouping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 328–335. Google Scholar
Digital Library
- [4] . 2018. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6154–6162.Google Scholar
Cross Ref
- [5] . 2001. Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems. 409–415. Google Scholar
Digital Library
- [6] . 2019. A new knowledge distillation for incremental object detection. In Proceedings of the 2019 International Joint Conference on Neural Networks. IEEE, Los Alamitos, CA, 1–7.Google Scholar
Cross Ref
- [7] . 2021. Self-training for domain adaptive scene text detection. In Proceedings of the 2020 25th International Conference on Pattern Recognition. IEEE, Los Alamitos, CA, 850–857. Google Scholar
Digital Library
- [8] . 2010. The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision 88, 2 (2010), 303–338. Google Scholar
Digital Library
- [9] . 1999. Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences 3, 4 (1999), 128–135.Google Scholar
Cross Ref
- [10] . 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448. Google Scholar
Digital Library
- [11] . 2013. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013).Google Scholar
- [12] . 2019. Take goods from shelves: A dataset for class-incremental object detection. In Proceedings of the 2019 International Conference on Multimedia Retrieval. 271–278. Google Scholar
Digital Library
- [13] . 2019. An end-to-end architecture for class-incremental object detection with knowledge distillation. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo. IEEE, Los Alamitos, CA, 1–6.Google Scholar
Cross Ref
- [14] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [15] . 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google Scholar
- [16] . 2019. Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 831–839.Google Scholar
Cross Ref
- [17] . 2016. Less-forgetting learning in deep neural networks. arXiv preprint arXiv:1607.00122 (2016).Google Scholar
- [18] . 2018. Less-forgetful learning for domain expansion in deep neural networks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 3358–3365. Google Scholar
Digital Library
- [19] . 2020. Supervised contrastive learning. In Advances in Neural Information Processing Systems 33.Google Scholar
- [20] . 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences 114, 13 (2017), 3521–3526.Google Scholar
Cross Ref
- [21] . 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105. Google Scholar
Digital Library
- [22] . 2013. From n to n+1: Multiclass transfer incremental learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3358–3365. Google Scholar
Digital Library
- [23] . 2019. RILOD: Near real-time incremental learning for object detection at the edge. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing. 113–126. Google Scholar
Digital Library
- [24] . 2017. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 12 (2017), 2935–2947.Google Scholar
Digital Library
- [25] . 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980–2988.Google Scholar
Cross Ref
- [26] . 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. 740–755.Google Scholar
Cross Ref
- [27] . 2019. Bidirectional attention-recognition model for fine-grained object classification. IEEE Transactions on Multimedia 22, 7 (2019), 1785–1795.Google Scholar
Cross Ref
- [28] . 2016. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. 21–37.Google Scholar
Cross Ref
- [29] . 2020. Video cloze procedure for self-supervised spatio-temporal learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11701–11708.Google Scholar
Cross Ref
- [30] . 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation. Vol. 24. Elsevier, 109–165.Google Scholar
- [31] . 2013. Distance-based image classification: Generalizing to new classes at near-zero cost. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 11 (2013), 2624–2637. Google Scholar
Digital Library
- [32] . 2020. Incremental few-shot object detection. arXiv preprint arXiv:2003.04668 (2020).Google Scholar
- [33] . 2001. Learn++: An incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man, and Cybernetics: Part C (Applications and Reviews) 31, 4 (2001), 497–508. Google Scholar
Digital Library
- [34] . 2020. Seed: Semantics enhanced encoder-decoder framework for scene text recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 13528–13537.Google Scholar
Cross Ref
- [35] . 2021. FC 2 RN: A fully convolutional corner refinement network for accurate multi-oriented scene text detection. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Los Alamitos, CA, 4350–4354.Google Scholar
Cross Ref
- [36] . 2019. Curved text detection in natural scene images with semi-and weakly-supervised learning. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR’19). IEEE, Los Alamitos, CA, 559–564.Google Scholar
Cross Ref
- [37] . 2017. Encoder based lifelong learning. In Proceedings of the IEEE International Conference on Computer Vision. 1320–1328.Google Scholar
Cross Ref
- [38] . 2017. iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2001–2010.Google Scholar
Cross Ref
- [39] . 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91–99. Google Scholar
Digital Library
- [40] . 2017. Incremental learning of object detectors without catastrophic forgetting. In Proceedings of the IEEE International Conference on Computer Vision. 3400–3409.Google Scholar
Cross Ref
- [41] . 2018. Active lifelong learning with “Watchdog.” In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 4107–4114. Google Scholar
Digital Library
- [42] . 2018. Lifelong metric learning. IEEE Transactions on Cybernetics 49, 8 (2018), 3168–3179.Google Scholar
Cross Ref
- [43] . 2020. R-Net: A relationship network for efficient and accurate scene text detection. IEEE Transactions on Multimedia 23 (2020), 1316–1329.Google Scholar
- [44] . 2020. Video playback rate perception for self-supervised spatio-temporal representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6548–6557.Google Scholar
Cross Ref
- [45] . 2017. Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine Learning—Volume 70. 3987–3995. Google Scholar
Digital Library
- [46] . 2020. Class-incremental learning via deep model consolidation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 1131–1140.Google Scholar
Cross Ref
- [47] . 2021. Progressive cluster purification for unsupervised feature learning. In Proceedings of the 2020 25th International Conference on Pattern Recognition. 8476–8483.Google Scholar
Cross Ref
- [48] . 2018. Hierarchical tracking by reinforcement learning-based searching and coarse-to-fine verifying. IEEE Transactions on Image Processing 28, 5 (2018), 2331–2341.Google Scholar
Cross Ref
- [49] . 2020. Fine-grained spatial alignment model for person re-identification with focal triplet loss. IEEE Transactions on Image Processing 29 (2020), 7578–7589.Google Scholar
Digital Library
- [50] . 2019. Objects as points. arXiv preprint arXiv:1904.07850 (2019).Google Scholar
- [51] . 2014. Edge boxes: Locating object proposals from edges. In Proceedings of the European Conference on Computer Vision. 391–405.Google Scholar
Cross Ref
- [52] . 2019. Few-shot object detection via feature reweighting. In Proceedings of the IEEE International Conference on Computer Vision. 8420–8429.Google Scholar
Cross Ref
- [53] . 2020. Frustratingly simple few-shot object detection. arXiv preprint arXiv:2003.06957.Google Scholar
- [54] . 2021. Few-cost salient object detection with adversarial-paced learning. arXiv preprint arXiv:2104.01928.Google Scholar
Index Terms
RD-IOD: Two-Level Residual-Distillation-Based Triple-Network for Incremental Object Detection
Recommendations
Multi-View correlation distillation for incremental object detection
Highlights- A novel incremental object detection method is proposed to explore and transfer the multi-view correlations in the feature space of the object detector.
AbstractIn real applications, new object classes often emerge after the detection model has been trained on a prepared dataset with fixed classes. Fine-tuning the old model with only new data will lead to a well-known phenomenon of ...
Scalability of knowledge distillation in incremental deep learning for fast object detection▪
AbstractVisual recognition requires incremental learning to scale its underlying deep learning models with continuous data growth. The existing scalability challenge is maintaining the balance between effectiveness (accuracy) and efficiency (...
Highlights- An experimental study to demonstrate knowledge retention and computational expense with varied key parameters in incremental learning for object detection.
RT-Net: replay-and-transfer network for class incremental object detection
AbstractDespite the remarkable performance achieved by DNN-based object detectors, class incremental object detection (CIOD) remains a challenge, in which the network has to learn to detect novel classes sequentially. Catastrophic forgetting is the main ...






Comments