M-RRFS: A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection

Huang, Peiliang; Zhang, Dingwen; Cheng, De; Han, Longfei; Zhu, Pengfei; Han, Junwei

doi:10.1007/s11263-024-02112-9

M-RRFS: A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection

Published: 22 May 2024

Volume 132, pages 4651–4672, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Peiliang Huang^1,2,
Dingwen Zhang ORCID: orcid.org/0000-0001-8369-8886^1,2,
De Cheng³,
Longfei Han⁴,
Pengfei Zhu⁵ &
…
Junwei Han^1,2

1344 Accesses
39 Citations
Explore all metrics

Abstract

With the goal to detect both the object categories appearing in the training phase and those never have been observed before testing, zero-shot object detection (ZSD) becomes a challenging yet anticipated task in the community. Current approaches tackle this problem by drawing on the feature synthesis techniques used in the zero-shot image classification (ZSC) task without delving into the inherent problems of ZSD. In this paper, we analyze the out-standing challenges that ZSD presents compared with ZSC—severe intra-class variation, complex category co-occurrence, open test scenario, and reveal their interference to the region feature synthesis process. In view of this, we propose a novel memory-based robust region feature synthesizer (M-RRFS) for ZSD, which is equipped with the Intra-class Semantic Diverging (IntraSD), the Inter-class Structure Preserving (InterSP), and the Cross-Domain Contrast Enhancing (CrossCE) mechanisms to overcome the inadequate intra-class diversity, insufficient inter-class separability, and weak inter-domain contrast problems. Moreover, when designing the whole learning framework, we develop an asynchronous memory container (AMC) to explore the cross-domain relationship between the seen class domain and unseen class domain to reduce the overlap between the distributions of them. Based on AMC, a memory-assisted ZSD inference process is also proposed to further boost the prediction accuracy. To evaluate the proposed approach, comprehensive experiments on MS-COCO, PASCAL VOC, ILSVRC and DIOR datasets are conducted, and superior performances have been achieved. Notably, we achieve new state-of-the-art performances on MS-COCO dataset, i.e., 64.0$\%$, 60.9$\%$ and 55.5$\%$ Recall@100 with IoU $= 0.4, 0.5, 0.6 $ respectively, and 15.1$\%$ mAp with IoU$=0.5$, under the 48/17 category split setting. Meanwhile, experiments on the DIOR dataset actually build the earliest benchmark for evaluating zero-shot object detection performance on remote sensing images. https://github.com/HPL123/M-RRFS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts

Article 24 July 2020

Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts

Synthesizing the Unseen for Zero-Shot Object Detection

Data Availability

The datasets generated during the current study are available in the MS COCO repository, https://cocodataset.org/#home, the PASCAL r- epository, http://host.robots.ox.ac.uk/pascal/VOC/, and the DIOR repository, https://pan.baidu.com/s/1iLKT0JQoKXEJTGNxt5lSMg#list/path=%2F.

References

Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2013). Label-embedding for attribute-based classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 819–826
Akata, Z., Reed, S., Walter, D., Lee, H., & Schiele, B. (2015). Evaluation of output embeddings for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2927–2936
Antonelli, S., Avola, D., Cinque, L., Crisostomi, D., Foresti, G. L., Galasso, F., Marini, M. R., Mecca, A., & Pannone, D. (2022). Few-shot object detection: A survey. ACM Computing Surveys (CSUR), 54(11s), 1–37.
Article Google Scholar
Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In: International conference on machine learning, PMLR, pp 214–223
Bansal, A., Sikka, K., Sharma, G., Chellappa, R., & Divakaran, A. (2018). Zero-shot object detection, in proceedings of the European Conference on Computer Vision (ECCV), pp 384–400
Bucher, M., Herbin, S., & Jurie, F. (2016). Improving semantic embedding consistency by metric learning for zero-shot classiffication. In: European conference on computer vision, Springer, pp 730–746
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers, in European conference on computer vision, Springer, pp 213–229
Chen, C., Han, J., & Debattista, K. (2024). Virtual category learning: A semi-supervised learning method for dense prediction with extremely limited labels. IEEE transactions on pattern analysis and machine intelligence
Chen, S., Wang, W., Xia, B., Peng, Q., You, X., Zheng, F., & Shao, L. (2021). Free: Feature refinement for generalized zero-shot learning, in proceedings of the IEEE/CVF international conference on computer vision, pp 122–131
Chen, S., Hong, Z., Xie, G.S., Yang, W., Peng, Q., Wang, K., Zhaom J., & You, X. (2022). Msdn: Mutually semantic distillation network for zero-shot learning, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7612–7621
Cheng, D., Wang, G., Wang, B., Zhang, Q., Han, J., & Zhang, D. (2023). Hybrid routing transformer for zero-shot learning. Pattern Recognition, 137, 109270.
Article Google Scholar
Cheng, D., Wang, G., Wang, N., Zhang, D., Zhang, Q., & Gao, X. (2023). Discriminative and robust attribute alignment for zero-shot learning. IEEE Transactions on Circuits and Systems for Video Technology
Christensen, A., Mancini, M., Koepke, A., Winther, O., & Akata, Z. (2023). Image-free classifier injection for zero-shot classification, in proceedings of the IEEE/CVF international conference on computer vision, pp 19072–19081
Dai, X., Wang, C., Li, H., Lin, S., Dong, L., Wu, J., & Wang, J. (2023). Synthetic feature assessment for zero-shot object detection, in 2023 IEEE international conference on multimedia and expo (ICME), IEEE, pp 444–449
Demirel, B., Cinbis, R.G., & Ikizler-Cinbis, N. (2018). Zero-shot object detection by hybrid region embedding. arXiv preprint arXiv:1805.06157
Demirel, B., Baran, O.B., & Cinbis, R.G. (2023). Meta-tuning loss functions and data augmentation for few-shot object detection, In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7339–7349
Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Ding, Z., Shao, M., & Fu, Y. (2018). Generative zero-shot learning via low-rank embedded semantic dictionary. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(12), 2861–2874.
Article Google Scholar
Elhoseiny, M., Zhu, Y., Zhang, H., & Elgammal, A. (2017). Link the head to the" beak": Zero shot learning from noisy text description at part precision, in proceedings of the IEEE conference on computer vision and pattern recognition, pp 5640–5649
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Fang, C., Tian, H., Zhang, D., Zhang, Q., Han, J., & Han, J. (2022). Densely nested top-down flows for salient object detection. Science China Information Sciences, 65(8), 1–14.
Article MathSciNet Google Scholar
Felix, R., Reid, I., Carneiro, G., et al. (2018). Multi-modal cycle-consistent generalized zero-shot learning, In proceedings of the european conference on computer vision (ECCV), pp 21–37
Feng, Y., Huang, X., Yang, P., Yu, J., & Sang, J. (2022). Non-generative generalized zero-shot learning via task-correlated disentanglement and controllable samples synthesis, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9346–9355
Fu, Y., Hospedales, T.M., Xiang, T., Fu, Z., & Gong, S. (2014). Transductive multi-view embedding for zero-shot recognition and annotation, In European conference on computer vision, Springer, pp 584–599
Fu, Y., Wang, X., Dong, H., Jiang, Y. G., Wang, M., Xue, X., & Sigal, L. (2019). Vocabulary-informed zero-shot and open-set learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(12), 3136–3152.
Article Google Scholar
Fu, Z., Xiang, T., Kodirov, E., & Gong, S. (2015). Zero-shot object recognition by semantic manifold distance, in, proceedings of the IEEE conference on computer vision and pattern recognition, pp 2635–2644
Fu, Z., Xiang, T., Kodirov, E., & Gong, S. (2017). Zero-shot learning on semantic class prototype graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(8), 2009–2022.
Article Google Scholar
Gao, J., Zhang, T., & Xu, C. (2020). Learning to model relationships for zero-shot video classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3476–3491.
Article Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of Wasserstein Gans. Advances in Neural Information Processing Systems, 30, 17.
Google Scholar
Gupta, D., Anantharaman, A., Mamgain, N., Balasubramanian, V.N., Jawahar, C., et al. (2020). A multi-space approach to zero-shot object detection, in proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1209–1217
Han, J., Zhang, D., Cheng, G., Liu, N., & Xu, D. (2018). Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Processing Magazine, 35(1), 84–100.
Article Google Scholar
Han, J., Ren, Y., Ding, J., Pan, X., Yan, K., & Xia, G.S. (2022). Expanding low-density latent regions for open-set object detection, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9591–9600
Han, Z., Fu, Z., & Yang, J. (2020). Learning the redundancy-free features for generalized zero-shot object recognition, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12865–12874
Han, Z., Fu, Z., Chen, S., & Yang, J. (2021). Contrastive embedding for generalized zero-shot learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2371–2381
Hao, F., He, F., Liu, L., Wu, F., Tao, D., & Cheng, J. (2023). Class-aware patch embedding adaptation for few-shot image classification, in proceedings of the IEEE/CVF international conference on computer vision, pp 18905–18915
Hayat, N., Hayat, M., Rahman, S., Khan, S., Zamir, S.W., & Khan, F.S. (2020). Synthesizing the unseen for zero-shot object detection, in proceedings of the Asian conference on computer vision
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738
Huang, H., Wang, C., Yu, P.S., & Wang, C.D. (2019). Generative dual adversarial network for generalized zero-shot learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 801–810
Huang, P., Han, J., Liu, N., Ren, J., & Zhang, D. (2021). Scribble-supervised video object segmentation. IEEE/CAA Journal of Automatica Sinica, 9(2), 339–353.
Article Google Scholar
Huang, P., Han, J., Cheng, D., & Zhang, D. (2022). Robust region feature synthesizer for zero-shot object detection, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7622–7631
Jocher, G., Stoken, A., Borovec, J., Chaurasia, A., Changyu, L., Laughing, V., Hogan, A., Hajek, J., Diaconu, L., Kwon, Y., et al. (2021). ultralytics/yolov5: v5. 0-yolov5-p6 1280 models, aws, supervise. ly and youtube integrations. Version v5 0 Apr
Kingma, D.P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning, in proceedings of the IEEE conference on computer vision and pattern recognition, pp 3174–3183
Kong, X., Gao, Z., Li, X., Hong, M., Liu, J., Wang, C., Xie, Y., & Qu, Y. (2022). En-compactness: Self-distillation embedding & contrastive generation for generalized zero-shot learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9306–9315
Kuo, C.W., Ma, C.Y., Huang, J.B., & Kira, Z. (2020). Featmatch: Feature-based augmentation for semi-supervised learning, In European conference on computer vision, Springer, pp 479–495
Kwon, G., & Al Regib, G. (2022). A gating model for bias calibration in generalized zero-shot learning. IEEE Transactions on Image Processing
Li, H., Mei, J., Zhou, J., & Hu, Y. (2023). Zero-shot object detection based on dynamic semantic vectors, in 2023 IEEE international conference on robotics and automation (ICRA), IEEE, pp 9267–9273
Li, Z., Yao, L., Zhang, X., Wang, X., Kanhere, S., & Zhang, H. (2019). Zero-shot object detection with textual descriptions. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 8690–8697.
Article Google Scholar
Liang, C., Ma, F., Zhu, L., Deng, Y., & Yang, Y. (2024). Caphuman: Capture your moments in parallel universes. arXiv preprint arXiv:2402.00627
Liang, J., Hu, D., & Feng, J. (2021). Domain adaptation with auxiliary target domain-oriented classifier, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16632–16642
Liao, W., Hu, K., Yang, M.Y., & Rosenhahn, B. (2022). Text to image generation with semantic-spatial aware gan. in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 18187–18196
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014). Microsoft coco: Common objects in context, in European conference on computer vision, Springer, pp 740–755
Liu, H., Zhang, L., Guan, J., & Zhou, S. (2023). Zero-shot object detection by semantics-aware detr with adaptive contrastive loss, in proceedings of the 31st ACM international conference on multimedia, pp 4421–4430
Liu, J., Sun, Y., Zhu, F., Pei, H., Yang, Y., & Li, W. (2022). Learning memory-augmented unidirectional metrics for cross-modality person re-identification, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19366–19375
Liu, N., Nan, K., Zhao, W., Liu, Y., Yao, X., Khan, S., Cholakkal, H., Anwer, R.M., Han, J,. & Khan, F.S. (2023). Multi-grained temporal prototype learning for few-shot video object segmentation, In proceedings of the IEEE/CVF international conference on computer vision, pp 18862–18871
Liu, R., Ge, Y., Choi, C.L., Wang, X., & Li, H. (2021). Divco: Diverse conditional image synthesis via contrastive generative adversarial network, In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16377–16386
Liu, Y., Dang, Y., Gao, X., Han, J., & Shao, L. (2022). Zero-shot learning with attentive region embedding and enhanced semantics. IEEE Transactions on Neural Networks and Learning Systems
Liu, Y., Liu, N., Yao, X., & Han, J. (2022). Intermediate prototype mining transformer for few-shot semantic segmentation. Advances in Neural Information Processing Systems, 35, 38020–38031.
Google Scholar
Liu, Y., Dang, Y., Gao, X., Han, J., & Shao, L. (2024). Zero-shot sketch-based image retrieval via adaptive relation-aware metric learning. Pattern Recognition, 152, 110452.
Article Google Scholar
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows, in proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Maas, A. L., Hannun, A. Y., Ng, A. Y., et al. (2013). Rectifier nonlinearities improve neural network acoustic models. Citeseer, 30, 3.
Google Scholar
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9(11), 18.
Google Scholar
Mao, Q., Lee, H.Y., Tseng, H.Y., Ma, S., & Yang, M.H. (2019). Mode seeking generative adversarial networks for diverse image synthesis, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1429–1437
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 13.
Google Scholar
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2018). Advances in pre-training distributed word representations. In: LREC
Nie, H., Wang, R., & Chen, X. (2022). From node to graph: Joint reasoning on visual-semantic relational graph for zero-shot detection, in proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1109–1118
Pambala, A., Dutta, T., & Biswas, S. (2020). Generative model with semantic embedding and integrated classifier for generalized zero-shot learning, in proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1237–1246
Pan, J., Zhu, P., Zhang, K., Cao, B., Wang, Y., Zhang, D., Han, J., & Hu, Q. (2022). Learning self-supervised low-rank network for single-stage weakly and semi-supervised semantic segmentation. International Journal of Computer Vision, 130(5), 1181–1195.
Article Google Scholar
Pourpanah, F., Abdar, M., Luo, Y., Zhou, X., Wang, R., Lim, C. P., Wang, X. Z., & Wu, Q. J. (2023). A review of generalized zero-shot learning methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4051–4070.
Google Scholar
Rahman, S., Khan, S., & Barnes, N. (2018). Polarity loss for zero-shot object detection. arXiv preprint arXiv:1811.08982
Rahman, S., Khan, S., & Porikli, F. (2018). Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts, in Asian conference on computer vision, Springer, pp 547–563
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 39(6), 1137–1149.
Google Scholar
Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.
Article Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115, 211–252.
Article MathSciNet Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. Advances in Neural Information Processing Systems, 29, 16.
Google Scholar
Sarma, S., KUMAR, S., & Sur, A. (2022). Resolving semantic confusions for improved zero-shot detection. In: 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022, BMVA Press
Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., & Akata, Z. (2019). Generalized zero-and few-shot learning via aligned variational autoencoders, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8247–8255
Sohn, K., Lee, H., & Yan, X. (2015). Learning structured output representation using deep conditional generative models. Advances in Neural Information Processing Systems, 28, 2015.
Google Scholar
Song, Y., Wang, T., Cai, P., Mondal, S. K., & Sahoo, J. P. (2023). A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities. ACM Computing Surveys, 55, 1–40.
Article Google Scholar
Su, H., Li, J., Chen, Z., Zhu, L., & Lu, K. (2022). Distinguishing unseen from seen for generalized zero-shot learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7885–7894
Sukhbaatar, S., Weston, J., Fergus, R., et al. (2015). End-to-end memory networks. Advances in Neural Information Processing Systems, 28, 15.
Google Scholar
Suo, Y., Zhu, L., & Yang, Y. (2023). Text augmented spatial-aware zero-shot referring image segmentation. arXiv preprint arXiv:2310.18049
Trosten, D.J., Chakraborty, R., Løkse, S., Wickstrøm, K.K., & Jenssen, R., Kampffmeyer, M.C. (2023). Hubs and hyperspheres: Reducing hubness and improving transductive few-shot learning with hyperspherical embeddings, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7527–7536
Wang, C.Y., Bochkovskiy, A., & Liao, H.Y.M. (2023). Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7464–7475
Wang, X., & Qi, G. J. (2022). Contrastive learning with stronger augmentations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), 5549–5560.
Google Scholar
Wang, X., Zhang, H., Huang, W., Scott, M.R. (2020). Cross-batch memory for embedding learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6388–6397
Wang, Z., Hao, Y., Mu, T., Li, O., Wang, S., & He, X. (2023). Bi-directional distribution alignment for transductive zero-shot learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19893–19902
Wu, J., Zhang, T., Zha, Z.J., Luo, J., Zhang, Y., & Wu, F. (2020). Self-supervised domain-aware generative network for generalized zero-shot learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12767–12776
Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification, in proceedings of the IEEE conference on computer vision and pattern recognition, pp 69–77
Xian, Y., Lorenz, T., Schiele, B., & Akata, Z. (2018). Feature generating networks for zero-shot learning. in proceedings of the IEEE conference on computer vision and pattern recognition, pp 5542–5551
Xu, B., Zeng, Z., Lian, C., & Ding, Z. (2022). Generative mixup networks for zero-shot learning. IEEE transactions on neural networks and learning systems
Xu, J., & Le, H. (2022). Generating representative samples for few-shot classification, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9003–9013
Yan, C., Chang, X., Luo, M., Liu, H., Zhang, X., & Zheng, Q. (2022). Semantics-guided contrastive network for zero-shot object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence
Yao, J., Han, L., Guo, G., Zheng, Z., Cong, R., Huang, X., Ding, J., Yang, K., Zhang, D., & Han, J. (2024). Position-based anchor optimization for point supervised dense nuclei detection. Neural Networks, 171, 159–170.
Article Google Scholar
Zhang, D., Zeng, W., Yao, J., & Han, J. (2020). Weakly supervised object detection using proposal-and semantic-level relationships. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6), 3349.
Article Google Scholar
Zhang, D., Han, J., Cheng, G., & Yang, M. H. (2021). Weakly supervised object localization and detection: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5866–5885.
Google Scholar
Zhang, D., Guo, G., Zeng, W., Li, L., & Han, J. (2022). Generalized weakly supervised object localization. IEEE Transactions on Neural Networks and Learning Systems
Zhang, D., Li, H., Zeng, W., Fang, C., Cheng, L., Cheng, M.M., & Han, J. (2023). Weakly supervised semantic segmentation via alternate self-dual teaching. IEEE Transactions on Image Processing
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D. N. (2018). Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1947–1962.
Article Google Scholar
Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning, in: proceedings of the IEEE conference on computer vision and pattern recognition, pp 2021–2030
Zhang, L., Wang, X., Yao, L., Wu, L., & Zheng, F. (2020). Zero-shot object detection via learning an embedding from semantic space to visual space. In: Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence $\{$IJCAI-PRICAI-20$\}$, International Joint Conferences on Artificial Intelligence Organization
Zhang, W., Janson, P., Yi, K., Skorokhodov, I., & Elhoseiny, M. (2023). Continual zero-shot learning through semantically guided generative random walks, in proceedings of the IEEE/CVF international conference on computer vision, pp 11574–11585
Zhang, X., Liu, Y., Dang, Y., Gao, X., Han, J., & Shao, L. (2024). Adaptive relation-aware network for zero-shot classification. Neural Networks, 174, 106227.
Zhao, S., Gao, C., Shao, Y., Li, L., Yu, C., Ji, Z., & Sang, N. (2020). Gtnet: Generative transfer network for zero-shot object detection. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12967–12974.
Zhao, X., Shen, Y., Wang, S., & Zhang, H. (2022). Boosting generative zero-shot learning by synthesizing diverse features with attribute augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 3454–3462.
Article Google Scholar
Zheng, Y., Huang, R., Han, C., Huang, X., & Cui, L. (2020). Background learnable cascade for zero-shot object detection, in proceedings of the asian conference on computer vision
Zhu, P., Wang, H., & Saligrama, V. (2020). Don’t even look once: Synthesizing features for zero-shot detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11693–11702

Download references

Author information

Authors and Affiliations

School of Automation, Northwestern Polytechnical University, Xi’an, China
Peiliang Huang, Dingwen Zhang & Junwei Han
Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
Peiliang Huang, Dingwen Zhang & Junwei Han
School of Telecommunications Engineering, Xidian University, Xi’an, China
De Cheng
School of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing, China
Longfei Han
College of Intelligence and Computing, Tianjin University, Tianjin, China
Pengfei Zhu

Authors

Peiliang Huang
View author publications
Search author on:PubMed Google Scholar
Dingwen Zhang
View author publications
Search author on:PubMed Google Scholar
De Cheng
View author publications
Search author on:PubMed Google Scholar
Longfei Han
View author publications
Search author on:PubMed Google Scholar
Pengfei Zhu
View author publications
Search author on:PubMed Google Scholar
Junwei Han
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Dingwen Zhang or Junwei Han.

Additional information

Communicated by Esa Rahtu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the Key-Area Research and Development Program of Guangdong Province (No. 2021B0101200001), the National Natural Science Foundation of China (Nos. U21B2048, 62322605, 62293543, 62202015), and the Institute of Artificial Intelligence, Hefei Comprehensive National Science Center Project Grant (21KT008).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, P., Zhang, D., Cheng, D. et al. M-RRFS: A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection. Int J Comput Vis 132, 4651–4672 (2024). https://doi.org/10.1007/s11263-024-02112-9

Download citation

Received: 21 May 2023
Accepted: 29 April 2024
Published: 22 May 2024
Version of record: 22 May 2024
Issue date: October 2024
DOI: https://doi.org/10.1007/s11263-024-02112-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

M-RRFS: A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts

Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts

Synthesizing the Unseen for Zero-Shot Object Detection

Explore related subjects

Data Availability

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now