这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Towards Non Co-occurrence Incremental Object Detection with Unlabeled In-the-Wild Data

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Deep networks have shown remarkable results in the task of object detection. However, their performance suffers critical drops when they are subsequently trained on novel classes without any sample from the base classes originally used to train the model. This phenomenon is known as catastrophic forgetting. Recently, several incremental learning methods are proposed to mitigate catastrophic forgetting for object detection. Despite the effectiveness, these methods require co-occurrence of the unlabeled base classes in the training data of the novel classes. This requirement is impractical in many real-world settings since the base classes do not necessarily co-occur with the novel classes. In view of this limitation, we consider a more practical setting of complete absence of co-occurrence of the base and novel classes for the object detection task. We propose the use of unlabeled in-the-wild data to bridge the non co-occurrence caused by the missing base classes during the training of additional novel classes. To this end, we introduce a blind sampling strategy based on the responses of the base-class model and pre-trained novel-class model to select a smaller relevant dataset from the large in-the-wild dataset for incremental learning. We then design a dual-teacher distillation framework to transfer the knowledge distilled from the base- and novel-class teacher models to the student model using the sampled in-the-wild data. Additionally, the novel class data is in the training to facilitate the learning of discriminative representations between base and novel classes. Furthermore, on the consideration that the training samples are all false positives when there is no class overlap in the in-the-wild data, we propose a single-teacher distillation framework to relieve the mutual suppression of the dual-teacher distillation framework and balance a trade-off between the performances of base and novel classes. Experimental results on the PASCAL VOC and MS-COCO datasets show that our proposed method significantly outperforms other state-of-the-art class-incremental object detection methods when there is no co-occurrence between the base and novel classes during training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets analysed during the current study are available as follows: 1. MS-COCO 2014 (Lin et al., 2014): https://cocodataset.org/#home 2. PASCAL VOC 2007 (Everingham et al., 2010): http://host.robots.ox.ac.uk/pascal/VOC/index.html

References

  • Castro, F. M., Marín-Jiménez, M. J., Guil, N., Schmid, C. & Alahari, K. (2018). End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV) (pp. 233–248).

  • Chen, L., Yu, C. & Chen, L. (2019). A new knowledge distillation for incremental object detection. In 2019 international joint conference on neural networks (IJCNN) (pp. 1–7). IEEE.

  • Dai, J., Li, Y., He, K. & Sun, J. (2016). R-FCN: Object detection via region-based fully convolutional networks. In Advances in neural information processing systems (pp. 379–387).

  • Dhar, P., Singh, R. V., Peng, K. C., Wu, Z. & Chellappa, R. (2019). Learning without memorizing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5138–5146).

  • Dong, N., Zhang, Y., Ding, M., & Lee, G. H. (2021). Bridging non co-occurrence with unlabeled in-the-wild data for incremental object detection. Advances in Neural Information Processing Systems, 34, 30492–30503.

    Google Scholar 

  • Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Gidaris, S. & Komodakis, N. (2015). Object detection via a multi-region and semantic segmentation-aware CNN model. In Proceedings of the IEEE international conference on computer vision (pp. 1134–1142).

  • Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE international conference on computer vision (pp. 1440–1448).

  • Girshick, R., Donahue, J., Darrell, T. & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).

  • Hao, Y., Fu, Y., Jiang, Y. G. & Tian, Q. (2019). An end-to-end architecture for class-incremental object detection with knowledge distillation. In 2019 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.

  • Kaiming, H., Xiangyu, Z., Shaoqing, R., & Jian, S. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–16.

    Google Scholar 

  • Kemker, R. & Kanan, C. (2017). Fearnet: Brain-inspired model for incremental learning. arXiv preprint arXiv:1711.10563

  • Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521–3526.

    Article  MathSciNet  Google Scholar 

  • Li, Z., & Hoiem, D. (2017). Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12), 2935–2947.

    Article  Google Scholar 

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740–755). Springer.

  • Lin, T. Y., Dollar, P., Girshick, R., He, K., Hariharan, B. &, Belongie, S. (2017a). Feature pyramid networks for object detection. In IEEE conference on computer vision and pattern recognition.

  • Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollár, P. (2017b). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988).

  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y. & Berg, A. C. (2016). SSD: Single shot multibox detector. In ECCV (pp. 21–37). Springer.

  • Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei. Y,, Zhang, Z., Lin, S. & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012–10022).

  • McCloskey, M. & Cohen, N. J. (1989). Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation (vol. 24, pp. 109–165). Elsevier.

  • Ostapenko, O., Puscas, M., Klein, T., Jahnichen, P. & Nabi, M. (2019). Learning to remember: A synaptic plasticity driven framework for continual learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 11321–11329).

  • Perez-Rua, J. M., Zhu, X., Hospedales, T. M. & Xiang, T. (2020). Incremental few-shot object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13846–13855).

  • Ratcliff, R. (1990). Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychological Review, 97(2), 285.

    Article  Google Scholar 

  • Rebuffi, S. A., Kolesnikov, A., Sperl, G. & Lampert, C. H. (2017). iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2001–2010).

  • Redmon, J. & Farhadi, A. (2017). Yolo9000: Better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263–7271).

  • Redmon, J. & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767

  • Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).

  • Ren, S., He, K., Girshick, R. & Jian, S. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In International conference on neural information processing systems.

  • Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C. & Bengio, Y. (2014). Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550

  • Shin, H., Lee, J. K., Kim, J., & Kim, J. (2017). Continual learning with deep generative replay. Advances in Neural Information Processing Systems, 30, 2990–2999.

    Google Scholar 

  • Shmelkov K, Schmid C, Alahari K (2017) Incremental learning of object detectors without catastrophic forgetting. In Proceedings of the IEEE international conference on computer vision (pp. 3400–3409).

  • Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.

    Article  Google Scholar 

  • Wu, Y., Chen, Y., Wang, L., Ye, Y., Liu, Z., Guo, Y. & Fu, Y. (2019). Large scale incremental learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 374–382).

  • Xiang, Y., Fu, Y., Ji, P. & Huang, H. (2019). Incremental learning using conditional adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 6619–6628).

  • Zhang J, Zhang, J., Ghosh, S., Li, D., Tasci, S., Heck, L., Zhang, H. & Kuo, C. C. J. (2020). Class-incremental learning via deep model consolidation. In The IEEE winter conference on applications of computer vision (pp. 1131–1140).

  • Zhou, W., Chang, S., Sosa, N., Hamann, H. &, Cox, D. (2020). Lifelong object detection. arXiv preprint arXiv:2009.01129

  • Zitnick, C. L. & Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In European conference on computer vision (pp. 391–405). Springer.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mingli Ding or Gim Hee Lee.

Ethics declarations

Conflict of interest

The first author is funded by a scholarship from the China Scholarship Council (CSC). This research is supported in part by the National Research Foundation, Singapore under its AI Singapore Program (AISG Award No: AISG2-RP-2020-016), the Tier 2 grant MOET2EP20120-0011 from the Singapore Ministry of Education, and the Natural Science Foundation of China, Grant No. 61603372.

Additional information

Communicated by Oliver Zendel

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Na Dong: Work done fully at the National University of Singapore.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, N., Zhang, Y., Ding, M. et al. Towards Non Co-occurrence Incremental Object Detection with Unlabeled In-the-Wild Data. Int J Comput Vis 132, 5066–5083 (2024). https://doi.org/10.1007/s11263-024-02048-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-024-02048-0

Keywords