Towards Non Co-occurrence Incremental Object Detection with Unlabeled In-the-Wild Data

Dong, Na; Zhang, Yongqiang; Ding, Mingli; Lee, Gim Hee

doi:10.1007/s11263-024-02048-0

Towards Non Co-occurrence Incremental Object Detection with Unlabeled In-the-Wild Data

Published: 01 June 2024

Volume 132, pages 5066–5083, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Na Dong ORCID: orcid.org/0000-0002-0247-3544¹,
Yongqiang Zhang¹,
Mingli Ding¹ &
…
Gim Hee Lee²

412 Accesses
1 Citation
Explore all metrics

Abstract

Deep networks have shown remarkable results in the task of object detection. However, their performance suffers critical drops when they are subsequently trained on novel classes without any sample from the base classes originally used to train the model. This phenomenon is known as catastrophic forgetting. Recently, several incremental learning methods are proposed to mitigate catastrophic forgetting for object detection. Despite the effectiveness, these methods require co-occurrence of the unlabeled base classes in the training data of the novel classes. This requirement is impractical in many real-world settings since the base classes do not necessarily co-occur with the novel classes. In view of this limitation, we consider a more practical setting of complete absence of co-occurrence of the base and novel classes for the object detection task. We propose the use of unlabeled in-the-wild data to bridge the non co-occurrence caused by the missing base classes during the training of additional novel classes. To this end, we introduce a blind sampling strategy based on the responses of the base-class model and pre-trained novel-class model to select a smaller relevant dataset from the large in-the-wild dataset for incremental learning. We then design a dual-teacher distillation framework to transfer the knowledge distilled from the base- and novel-class teacher models to the student model using the sampled in-the-wild data. Additionally, the novel class data is in the training to facilitate the learning of discriminative representations between base and novel classes. Furthermore, on the consideration that the training samples are all false positives when there is no class overlap in the in-the-wild data, we propose a single-teacher distillation framework to relieve the mutual suppression of the dual-teacher distillation framework and balance a trade-off between the performances of base and novel classes. Experimental results on the PASCAL VOC and MS-COCO datasets show that our proposed method significantly outperforms other state-of-the-art class-incremental object detection methods when there is no co-occurrence between the base and novel classes during training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 5

Fig. 7

Class-Incremental Novel Class Discovery

Robust Feature Space Organization with Distillation for Few-Shot Object Detection

One-stage object detection knowledge distillation via adversarial learning

Article 24 July 2021

Data availability

The datasets analysed during the current study are available as follows: 1. MS-COCO 2014 (Lin et al., 2014): https://cocodataset.org/#home 2. PASCAL VOC 2007 (Everingham et al., 2010): http://host.robots.ox.ac.uk/pascal/VOC/index.html

References

Castro, F. M., Marín-Jiménez, M. J., Guil, N., Schmid, C. & Alahari, K. (2018). End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV) (pp. 233–248).
Chen, L., Yu, C. & Chen, L. (2019). A new knowledge distillation for incremental object detection. In 2019 international joint conference on neural networks (IJCNN) (pp. 1–7). IEEE.
Dai, J., Li, Y., He, K. & Sun, J. (2016). R-FCN: Object detection via region-based fully convolutional networks. In Advances in neural information processing systems (pp. 379–387).
Dhar, P., Singh, R. V., Peng, K. C., Wu, Z. & Chellappa, R. (2019). Learning without memorizing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5138–5146).
Dong, N., Zhang, Y., Ding, M., & Lee, G. H. (2021). Bridging non co-occurrence with unlabeled in-the-wild data for incremental object detection. Advances in Neural Information Processing Systems, 34, 30492–30503.
Google Scholar
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Gidaris, S. & Komodakis, N. (2015). Object detection via a multi-region and semantic segmentation-aware CNN model. In Proceedings of the IEEE international conference on computer vision (pp. 1134–1142).
Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE international conference on computer vision (pp. 1440–1448).
Girshick, R., Donahue, J., Darrell, T. & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).
Hao, Y., Fu, Y., Jiang, Y. G. & Tian, Q. (2019). An end-to-end architecture for class-incremental object detection with knowledge distillation. In 2019 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.
Kaiming, H., Xiangyu, Z., Shaoqing, R., & Jian, S. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–16.
Google Scholar
Kemker, R. & Kanan, C. (2017). Fearnet: Brain-inspired model for incremental learning. arXiv preprint arXiv:1711.10563
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521–3526.
Article MathSciNet Google Scholar
Li, Z., & Hoiem, D. (2017). Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12), 2935–2947.
Article Google Scholar
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740–755). Springer.
Lin, T. Y., Dollar, P., Girshick, R., He, K., Hariharan, B. &, Belongie, S. (2017a). Feature pyramid networks for object detection. In IEEE conference on computer vision and pattern recognition.
Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollár, P. (2017b). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y. & Berg, A. C. (2016). SSD: Single shot multibox detector. In ECCV (pp. 21–37). Springer.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei. Y,, Zhang, Z., Lin, S. & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012–10022).
McCloskey, M. & Cohen, N. J. (1989). Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation (vol. 24, pp. 109–165). Elsevier.
Ostapenko, O., Puscas, M., Klein, T., Jahnichen, P. & Nabi, M. (2019). Learning to remember: A synaptic plasticity driven framework for continual learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 11321–11329).
Perez-Rua, J. M., Zhu, X., Hospedales, T. M. & Xiang, T. (2020). Incremental few-shot object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13846–13855).
Ratcliff, R. (1990). Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychological Review, 97(2), 285.
Article Google Scholar
Rebuffi, S. A., Kolesnikov, A., Sperl, G. & Lampert, C. H. (2017). iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2001–2010).
Redmon, J. & Farhadi, A. (2017). Yolo9000: Better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263–7271).
Redmon, J. & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).
Ren, S., He, K., Girshick, R. & Jian, S. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In International conference on neural information processing systems.
Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C. & Bengio, Y. (2014). Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550
Shin, H., Lee, J. K., Kim, J., & Kim, J. (2017). Continual learning with deep generative replay. Advances in Neural Information Processing Systems, 30, 2990–2999.
Google Scholar
Shmelkov K, Schmid C, Alahari K (2017) Incremental learning of object detectors without catastrophic forgetting. In Proceedings of the IEEE international conference on computer vision (pp. 3400–3409).
Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.
Article Google Scholar
Wu, Y., Chen, Y., Wang, L., Ye, Y., Liu, Z., Guo, Y. & Fu, Y. (2019). Large scale incremental learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 374–382).
Xiang, Y., Fu, Y., Ji, P. & Huang, H. (2019). Incremental learning using conditional adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 6619–6628).
Zhang J, Zhang, J., Ghosh, S., Li, D., Tasci, S., Heck, L., Zhang, H. & Kuo, C. C. J. (2020). Class-incremental learning via deep model consolidation. In The IEEE winter conference on applications of computer vision (pp. 1131–1140).
Zhou, W., Chang, S., Sosa, N., Hamann, H. &, Cox, D. (2020). Lifelong object detection. arXiv preprint arXiv:2009.01129
Zitnick, C. L. & Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In European conference on computer vision (pp. 391–405). Springer.

Download references

Author information

Authors and Affiliations

School of Instrumentation Science and Engineering, Harbin Institute of Technology, Harbin, China
Na Dong, Yongqiang Zhang & Mingli Ding
Department of Computer Science, National University of Singapore, Singapore, Singapore
Gim Hee Lee

Authors

Na Dong
View author publications
Search author on:PubMed Google Scholar
Yongqiang Zhang
View author publications
Search author on:PubMed Google Scholar
Mingli Ding
View author publications
Search author on:PubMed Google Scholar
Gim Hee Lee
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Mingli Ding or Gim Hee Lee.

Ethics declarations

Conflict of interest

The first author is funded by a scholarship from the China Scholarship Council (CSC). This research is supported in part by the National Research Foundation, Singapore under its AI Singapore Program (AISG Award No: AISG2-RP-2020-016), the Tier 2 grant MOET2EP20120-0011 from the Singapore Ministry of Education, and the Natural Science Foundation of China, Grant No. 61603372.

Additional information

Communicated by Oliver Zendel

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Na Dong: Work done fully at the National University of Singapore.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dong, N., Zhang, Y., Ding, M. et al. Towards Non Co-occurrence Incremental Object Detection with Unlabeled In-the-Wild Data. Int J Comput Vis 132, 5066–5083 (2024). https://doi.org/10.1007/s11263-024-02048-0

Download citation

Received: 16 December 2022
Accepted: 04 March 2024
Published: 01 June 2024
Version of record: 01 June 2024
Issue date: November 2024
DOI: https://doi.org/10.1007/s11263-024-02048-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Non Co-occurrence Incremental Object Detection with Unlabeled In-the-Wild Data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Class-Incremental Novel Class Discovery

Robust Feature Space Organization with Distillation for Few-Shot Object Detection

One-stage object detection knowledge distillation via adversarial learning

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now