Abstract
Deep networks have shown remarkable results in the task of object detection. However, their performance suffers critical drops when they are subsequently trained on novel classes without any sample from the base classes originally used to train the model. This phenomenon is known as catastrophic forgetting. Recently, several incremental learning methods are proposed to mitigate catastrophic forgetting for object detection. Despite the effectiveness, these methods require co-occurrence of the unlabeled base classes in the training data of the novel classes. This requirement is impractical in many real-world settings since the base classes do not necessarily co-occur with the novel classes. In view of this limitation, we consider a more practical setting of complete absence of co-occurrence of the base and novel classes for the object detection task. We propose the use of unlabeled in-the-wild data to bridge the non co-occurrence caused by the missing base classes during the training of additional novel classes. To this end, we introduce a blind sampling strategy based on the responses of the base-class model and pre-trained novel-class model to select a smaller relevant dataset from the large in-the-wild dataset for incremental learning. We then design a dual-teacher distillation framework to transfer the knowledge distilled from the base- and novel-class teacher models to the student model using the sampled in-the-wild data. Additionally, the novel class data is in the training to facilitate the learning of discriminative representations between base and novel classes. Furthermore, on the consideration that the training samples are all false positives when there is no class overlap in the in-the-wild data, we propose a single-teacher distillation framework to relieve the mutual suppression of the dual-teacher distillation framework and balance a trade-off between the performances of base and novel classes. Experimental results on the PASCAL VOC and MS-COCO datasets show that our proposed method significantly outperforms other state-of-the-art class-incremental object detection methods when there is no co-occurrence between the base and novel classes during training.
Similar content being viewed by others
Data availability
The datasets analysed during the current study are available as follows: 1. MS-COCO 2014 (Lin et al., 2014): https://cocodataset.org/#home 2. PASCAL VOC 2007 (Everingham et al., 2010): http://host.robots.ox.ac.uk/pascal/VOC/index.html
References
Castro, F. M., Marín-Jiménez, M. J., Guil, N., Schmid, C. & Alahari, K. (2018). End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV) (pp. 233–248).
Chen, L., Yu, C. & Chen, L. (2019). A new knowledge distillation for incremental object detection. In 2019 international joint conference on neural networks (IJCNN) (pp. 1–7). IEEE.
Dai, J., Li, Y., He, K. & Sun, J. (2016). R-FCN: Object detection via region-based fully convolutional networks. In Advances in neural information processing systems (pp. 379–387).
Dhar, P., Singh, R. V., Peng, K. C., Wu, Z. & Chellappa, R. (2019). Learning without memorizing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5138–5146).
Dong, N., Zhang, Y., Ding, M., & Lee, G. H. (2021). Bridging non co-occurrence with unlabeled in-the-wild data for incremental object detection. Advances in Neural Information Processing Systems, 34, 30492–30503.
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Gidaris, S. & Komodakis, N. (2015). Object detection via a multi-region and semantic segmentation-aware CNN model. In Proceedings of the IEEE international conference on computer vision (pp. 1134–1142).
Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE international conference on computer vision (pp. 1440–1448).
Girshick, R., Donahue, J., Darrell, T. & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).
Hao, Y., Fu, Y., Jiang, Y. G. & Tian, Q. (2019). An end-to-end architecture for class-incremental object detection with knowledge distillation. In 2019 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.
Kaiming, H., Xiangyu, Z., Shaoqing, R., & Jian, S. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–16.
Kemker, R. & Kanan, C. (2017). Fearnet: Brain-inspired model for incremental learning. arXiv preprint arXiv:1711.10563
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521–3526.
Li, Z., & Hoiem, D. (2017). Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12), 2935–2947.
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740–755). Springer.
Lin, T. Y., Dollar, P., Girshick, R., He, K., Hariharan, B. &, Belongie, S. (2017a). Feature pyramid networks for object detection. In IEEE conference on computer vision and pattern recognition.
Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollár, P. (2017b). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y. & Berg, A. C. (2016). SSD: Single shot multibox detector. In ECCV (pp. 21–37). Springer.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei. Y,, Zhang, Z., Lin, S. & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012–10022).
McCloskey, M. & Cohen, N. J. (1989). Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation (vol. 24, pp. 109–165). Elsevier.
Ostapenko, O., Puscas, M., Klein, T., Jahnichen, P. & Nabi, M. (2019). Learning to remember: A synaptic plasticity driven framework for continual learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 11321–11329).
Perez-Rua, J. M., Zhu, X., Hospedales, T. M. & Xiang, T. (2020). Incremental few-shot object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13846–13855).
Ratcliff, R. (1990). Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychological Review, 97(2), 285.
Rebuffi, S. A., Kolesnikov, A., Sperl, G. & Lampert, C. H. (2017). iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2001–2010).
Redmon, J. & Farhadi, A. (2017). Yolo9000: Better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263–7271).
Redmon, J. & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).
Ren, S., He, K., Girshick, R. & Jian, S. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In International conference on neural information processing systems.
Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C. & Bengio, Y. (2014). Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550
Shin, H., Lee, J. K., Kim, J., & Kim, J. (2017). Continual learning with deep generative replay. Advances in Neural Information Processing Systems, 30, 2990–2999.
Shmelkov K, Schmid C, Alahari K (2017) Incremental learning of object detectors without catastrophic forgetting. In Proceedings of the IEEE international conference on computer vision (pp. 3400–3409).
Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.
Wu, Y., Chen, Y., Wang, L., Ye, Y., Liu, Z., Guo, Y. & Fu, Y. (2019). Large scale incremental learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 374–382).
Xiang, Y., Fu, Y., Ji, P. & Huang, H. (2019). Incremental learning using conditional adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 6619–6628).
Zhang J, Zhang, J., Ghosh, S., Li, D., Tasci, S., Heck, L., Zhang, H. & Kuo, C. C. J. (2020). Class-incremental learning via deep model consolidation. In The IEEE winter conference on applications of computer vision (pp. 1131–1140).
Zhou, W., Chang, S., Sosa, N., Hamann, H. &, Cox, D. (2020). Lifelong object detection. arXiv preprint arXiv:2009.01129
Zitnick, C. L. & Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In European conference on computer vision (pp. 391–405). Springer.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The first author is funded by a scholarship from the China Scholarship Council (CSC). This research is supported in part by the National Research Foundation, Singapore under its AI Singapore Program (AISG Award No: AISG2-RP-2020-016), the Tier 2 grant MOET2EP20120-0011 from the Singapore Ministry of Education, and the Natural Science Foundation of China, Grant No. 61603372.
Additional information
Communicated by Oliver Zendel
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Na Dong: Work done fully at the National University of Singapore.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dong, N., Zhang, Y., Ding, M. et al. Towards Non Co-occurrence Incremental Object Detection with Unlabeled In-the-Wild Data. Int J Comput Vis 132, 5066–5083 (2024). https://doi.org/10.1007/s11263-024-02048-0
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02048-0