Abstract
With the goal to detect both the object categories appearing in the training phase and those never have been observed before testing, zero-shot object detection (ZSD) becomes a challenging yet anticipated task in the community. Current approaches tackle this problem by drawing on the feature synthesis techniques used in the zero-shot image classification (ZSC) task without delving into the inherent problems of ZSD. In this paper, we analyze the out-standing challenges that ZSD presents compared with ZSC—severe intra-class variation, complex category co-occurrence, open test scenario, and reveal their interference to the region feature synthesis process. In view of this, we propose a novel memory-based robust region feature synthesizer (M-RRFS) for ZSD, which is equipped with the Intra-class Semantic Diverging (IntraSD), the Inter-class Structure Preserving (InterSP), and the Cross-Domain Contrast Enhancing (CrossCE) mechanisms to overcome the inadequate intra-class diversity, insufficient inter-class separability, and weak inter-domain contrast problems. Moreover, when designing the whole learning framework, we develop an asynchronous memory container (AMC) to explore the cross-domain relationship between the seen class domain and unseen class domain to reduce the overlap between the distributions of them. Based on AMC, a memory-assisted ZSD inference process is also proposed to further boost the prediction accuracy. To evaluate the proposed approach, comprehensive experiments on MS-COCO, PASCAL VOC, ILSVRC and DIOR datasets are conducted, and superior performances have been achieved. Notably, we achieve new state-of-the-art performances on MS-COCO dataset, i.e., 64.0\(\%\), 60.9\(\%\) and 55.5\(\%\) Recall@100 with IoU \(= 0.4, 0.5, 0.6 \) respectively, and 15.1\(\%\) mAp with IoU\(=0.5\), under the 48/17 category split setting. Meanwhile, experiments on the DIOR dataset actually build the earliest benchmark for evaluating zero-shot object detection performance on remote sensing images. https://github.com/HPL123/M-RRFS.
Similar content being viewed by others
Data Availability
The datasets generated during the current study are available in the MS COCO repository, https://cocodataset.org/#home, the PASCAL r- epository, http://host.robots.ox.ac.uk/pascal/VOC/, and the DIOR repository, https://pan.baidu.com/s/1iLKT0JQoKXEJTGNxt5lSMg#list/path=%2F.
References
Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2013). Label-embedding for attribute-based classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 819–826
Akata, Z., Reed, S., Walter, D., Lee, H., & Schiele, B. (2015). Evaluation of output embeddings for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2927–2936
Antonelli, S., Avola, D., Cinque, L., Crisostomi, D., Foresti, G. L., Galasso, F., Marini, M. R., Mecca, A., & Pannone, D. (2022). Few-shot object detection: A survey. ACM Computing Surveys (CSUR), 54(11s), 1–37.
Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In: International conference on machine learning, PMLR, pp 214–223
Bansal, A., Sikka, K., Sharma, G., Chellappa, R., & Divakaran, A. (2018). Zero-shot object detection, in proceedings of the European Conference on Computer Vision (ECCV), pp 384–400
Bucher, M., Herbin, S., & Jurie, F. (2016). Improving semantic embedding consistency by metric learning for zero-shot classiffication. In: European conference on computer vision, Springer, pp 730–746
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers, in European conference on computer vision, Springer, pp 213–229
Chen, C., Han, J., & Debattista, K. (2024). Virtual category learning: A semi-supervised learning method for dense prediction with extremely limited labels. IEEE transactions on pattern analysis and machine intelligence
Chen, S., Wang, W., Xia, B., Peng, Q., You, X., Zheng, F., & Shao, L. (2021). Free: Feature refinement for generalized zero-shot learning, in proceedings of the IEEE/CVF international conference on computer vision, pp 122–131
Chen, S., Hong, Z., Xie, G.S., Yang, W., Peng, Q., Wang, K., Zhaom J., & You, X. (2022). Msdn: Mutually semantic distillation network for zero-shot learning, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7612–7621
Cheng, D., Wang, G., Wang, B., Zhang, Q., Han, J., & Zhang, D. (2023). Hybrid routing transformer for zero-shot learning. Pattern Recognition, 137, 109270.
Cheng, D., Wang, G., Wang, N., Zhang, D., Zhang, Q., & Gao, X. (2023). Discriminative and robust attribute alignment for zero-shot learning. IEEE Transactions on Circuits and Systems for Video Technology
Christensen, A., Mancini, M., Koepke, A., Winther, O., & Akata, Z. (2023). Image-free classifier injection for zero-shot classification, in proceedings of the IEEE/CVF international conference on computer vision, pp 19072–19081
Dai, X., Wang, C., Li, H., Lin, S., Dong, L., Wu, J., & Wang, J. (2023). Synthetic feature assessment for zero-shot object detection, in 2023 IEEE international conference on multimedia and expo (ICME), IEEE, pp 444–449
Demirel, B., Cinbis, R.G., & Ikizler-Cinbis, N. (2018). Zero-shot object detection by hybrid region embedding. arXiv preprint arXiv:1805.06157
Demirel, B., Baran, O.B., & Cinbis, R.G. (2023). Meta-tuning loss functions and data augmentation for few-shot object detection, In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7339–7349
Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Ding, Z., Shao, M., & Fu, Y. (2018). Generative zero-shot learning via low-rank embedded semantic dictionary. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(12), 2861–2874.
Elhoseiny, M., Zhu, Y., Zhang, H., & Elgammal, A. (2017). Link the head to the" beak": Zero shot learning from noisy text description at part precision, in proceedings of the IEEE conference on computer vision and pattern recognition, pp 5640–5649
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Fang, C., Tian, H., Zhang, D., Zhang, Q., Han, J., & Han, J. (2022). Densely nested top-down flows for salient object detection. Science China Information Sciences, 65(8), 1–14.
Felix, R., Reid, I., Carneiro, G., et al. (2018). Multi-modal cycle-consistent generalized zero-shot learning, In proceedings of the european conference on computer vision (ECCV), pp 21–37
Feng, Y., Huang, X., Yang, P., Yu, J., & Sang, J. (2022). Non-generative generalized zero-shot learning via task-correlated disentanglement and controllable samples synthesis, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9346–9355
Fu, Y., Hospedales, T.M., Xiang, T., Fu, Z., & Gong, S. (2014). Transductive multi-view embedding for zero-shot recognition and annotation, In European conference on computer vision, Springer, pp 584–599
Fu, Y., Wang, X., Dong, H., Jiang, Y. G., Wang, M., Xue, X., & Sigal, L. (2019). Vocabulary-informed zero-shot and open-set learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(12), 3136–3152.
Fu, Z., Xiang, T., Kodirov, E., & Gong, S. (2015). Zero-shot object recognition by semantic manifold distance, in, proceedings of the IEEE conference on computer vision and pattern recognition, pp 2635–2644
Fu, Z., Xiang, T., Kodirov, E., & Gong, S. (2017). Zero-shot learning on semantic class prototype graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(8), 2009–2022.
Gao, J., Zhang, T., & Xu, C. (2020). Learning to model relationships for zero-shot video classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3476–3491.
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of Wasserstein Gans. Advances in Neural Information Processing Systems, 30, 17.
Gupta, D., Anantharaman, A., Mamgain, N., Balasubramanian, V.N., Jawahar, C., et al. (2020). A multi-space approach to zero-shot object detection, in proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1209–1217
Han, J., Zhang, D., Cheng, G., Liu, N., & Xu, D. (2018). Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Processing Magazine, 35(1), 84–100.
Han, J., Ren, Y., Ding, J., Pan, X., Yan, K., & Xia, G.S. (2022). Expanding low-density latent regions for open-set object detection, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9591–9600
Han, Z., Fu, Z., & Yang, J. (2020). Learning the redundancy-free features for generalized zero-shot object recognition, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12865–12874
Han, Z., Fu, Z., Chen, S., & Yang, J. (2021). Contrastive embedding for generalized zero-shot learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2371–2381
Hao, F., He, F., Liu, L., Wu, F., Tao, D., & Cheng, J. (2023). Class-aware patch embedding adaptation for few-shot image classification, in proceedings of the IEEE/CVF international conference on computer vision, pp 18905–18915
Hayat, N., Hayat, M., Rahman, S., Khan, S., Zamir, S.W., & Khan, F.S. (2020). Synthesizing the unseen for zero-shot object detection, in proceedings of the Asian conference on computer vision
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738
Huang, H., Wang, C., Yu, P.S., & Wang, C.D. (2019). Generative dual adversarial network for generalized zero-shot learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 801–810
Huang, P., Han, J., Liu, N., Ren, J., & Zhang, D. (2021). Scribble-supervised video object segmentation. IEEE/CAA Journal of Automatica Sinica, 9(2), 339–353.
Huang, P., Han, J., Cheng, D., & Zhang, D. (2022). Robust region feature synthesizer for zero-shot object detection, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7622–7631
Jocher, G., Stoken, A., Borovec, J., Chaurasia, A., Changyu, L., Laughing, V., Hogan, A., Hajek, J., Diaconu, L., Kwon, Y., et al. (2021). ultralytics/yolov5: v5. 0-yolov5-p6 1280 models, aws, supervise. ly and youtube integrations. Version v5 0 Apr
Kingma, D.P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning, in proceedings of the IEEE conference on computer vision and pattern recognition, pp 3174–3183
Kong, X., Gao, Z., Li, X., Hong, M., Liu, J., Wang, C., Xie, Y., & Qu, Y. (2022). En-compactness: Self-distillation embedding & contrastive generation for generalized zero-shot learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9306–9315
Kuo, C.W., Ma, C.Y., Huang, J.B., & Kira, Z. (2020). Featmatch: Feature-based augmentation for semi-supervised learning, In European conference on computer vision, Springer, pp 479–495
Kwon, G., & Al Regib, G. (2022). A gating model for bias calibration in generalized zero-shot learning. IEEE Transactions on Image Processing
Li, H., Mei, J., Zhou, J., & Hu, Y. (2023). Zero-shot object detection based on dynamic semantic vectors, in 2023 IEEE international conference on robotics and automation (ICRA), IEEE, pp 9267–9273
Li, Z., Yao, L., Zhang, X., Wang, X., Kanhere, S., & Zhang, H. (2019). Zero-shot object detection with textual descriptions. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 8690–8697.
Liang, C., Ma, F., Zhu, L., Deng, Y., & Yang, Y. (2024). Caphuman: Capture your moments in parallel universes. arXiv preprint arXiv:2402.00627
Liang, J., Hu, D., & Feng, J. (2021). Domain adaptation with auxiliary target domain-oriented classifier, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16632–16642
Liao, W., Hu, K., Yang, M.Y., & Rosenhahn, B. (2022). Text to image generation with semantic-spatial aware gan. in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 18187–18196
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014). Microsoft coco: Common objects in context, in European conference on computer vision, Springer, pp 740–755
Liu, H., Zhang, L., Guan, J., & Zhou, S. (2023). Zero-shot object detection by semantics-aware detr with adaptive contrastive loss, in proceedings of the 31st ACM international conference on multimedia, pp 4421–4430
Liu, J., Sun, Y., Zhu, F., Pei, H., Yang, Y., & Li, W. (2022). Learning memory-augmented unidirectional metrics for cross-modality person re-identification, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19366–19375
Liu, N., Nan, K., Zhao, W., Liu, Y., Yao, X., Khan, S., Cholakkal, H., Anwer, R.M., Han, J,. & Khan, F.S. (2023). Multi-grained temporal prototype learning for few-shot video object segmentation, In proceedings of the IEEE/CVF international conference on computer vision, pp 18862–18871
Liu, R., Ge, Y., Choi, C.L., Wang, X., & Li, H. (2021). Divco: Diverse conditional image synthesis via contrastive generative adversarial network, In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16377–16386
Liu, Y., Dang, Y., Gao, X., Han, J., & Shao, L. (2022). Zero-shot learning with attentive region embedding and enhanced semantics. IEEE Transactions on Neural Networks and Learning Systems
Liu, Y., Liu, N., Yao, X., & Han, J. (2022). Intermediate prototype mining transformer for few-shot semantic segmentation. Advances in Neural Information Processing Systems, 35, 38020–38031.
Liu, Y., Dang, Y., Gao, X., Han, J., & Shao, L. (2024). Zero-shot sketch-based image retrieval via adaptive relation-aware metric learning. Pattern Recognition, 152, 110452.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows, in proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Maas, A. L., Hannun, A. Y., Ng, A. Y., et al. (2013). Rectifier nonlinearities improve neural network acoustic models. Citeseer, 30, 3.
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9(11), 18.
Mao, Q., Lee, H.Y., Tseng, H.Y., Ma, S., & Yang, M.H. (2019). Mode seeking generative adversarial networks for diverse image synthesis, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1429–1437
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 13.
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2018). Advances in pre-training distributed word representations. In: LREC
Nie, H., Wang, R., & Chen, X. (2022). From node to graph: Joint reasoning on visual-semantic relational graph for zero-shot detection, in proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1109–1118
Pambala, A., Dutta, T., & Biswas, S. (2020). Generative model with semantic embedding and integrated classifier for generalized zero-shot learning, in proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1237–1246
Pan, J., Zhu, P., Zhang, K., Cao, B., Wang, Y., Zhang, D., Han, J., & Hu, Q. (2022). Learning self-supervised low-rank network for single-stage weakly and semi-supervised semantic segmentation. International Journal of Computer Vision, 130(5), 1181–1195.
Pourpanah, F., Abdar, M., Luo, Y., Zhou, X., Wang, R., Lim, C. P., Wang, X. Z., & Wu, Q. J. (2023). A review of generalized zero-shot learning methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4051–4070.
Rahman, S., Khan, S., & Barnes, N. (2018). Polarity loss for zero-shot object detection. arXiv preprint arXiv:1811.08982
Rahman, S., Khan, S., & Porikli, F. (2018). Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts, in Asian conference on computer vision, Springer, pp 547–563
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 39(6), 1137–1149.
Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115, 211–252.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. Advances in Neural Information Processing Systems, 29, 16.
Sarma, S., KUMAR, S., & Sur, A. (2022). Resolving semantic confusions for improved zero-shot detection. In: 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022, BMVA Press
Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., & Akata, Z. (2019). Generalized zero-and few-shot learning via aligned variational autoencoders, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8247–8255
Sohn, K., Lee, H., & Yan, X. (2015). Learning structured output representation using deep conditional generative models. Advances in Neural Information Processing Systems, 28, 2015.
Song, Y., Wang, T., Cai, P., Mondal, S. K., & Sahoo, J. P. (2023). A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities. ACM Computing Surveys, 55, 1–40.
Su, H., Li, J., Chen, Z., Zhu, L., & Lu, K. (2022). Distinguishing unseen from seen for generalized zero-shot learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7885–7894
Sukhbaatar, S., Weston, J., Fergus, R., et al. (2015). End-to-end memory networks. Advances in Neural Information Processing Systems, 28, 15.
Suo, Y., Zhu, L., & Yang, Y. (2023). Text augmented spatial-aware zero-shot referring image segmentation. arXiv preprint arXiv:2310.18049
Trosten, D.J., Chakraborty, R., Løkse, S., Wickstrøm, K.K., & Jenssen, R., Kampffmeyer, M.C. (2023). Hubs and hyperspheres: Reducing hubness and improving transductive few-shot learning with hyperspherical embeddings, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7527–7536
Wang, C.Y., Bochkovskiy, A., & Liao, H.Y.M. (2023). Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7464–7475
Wang, X., & Qi, G. J. (2022). Contrastive learning with stronger augmentations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), 5549–5560.
Wang, X., Zhang, H., Huang, W., Scott, M.R. (2020). Cross-batch memory for embedding learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6388–6397
Wang, Z., Hao, Y., Mu, T., Li, O., Wang, S., & He, X. (2023). Bi-directional distribution alignment for transductive zero-shot learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19893–19902
Wu, J., Zhang, T., Zha, Z.J., Luo, J., Zhang, Y., & Wu, F. (2020). Self-supervised domain-aware generative network for generalized zero-shot learning, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12767–12776
Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification, in proceedings of the IEEE conference on computer vision and pattern recognition, pp 69–77
Xian, Y., Lorenz, T., Schiele, B., & Akata, Z. (2018). Feature generating networks for zero-shot learning. in proceedings of the IEEE conference on computer vision and pattern recognition, pp 5542–5551
Xu, B., Zeng, Z., Lian, C., & Ding, Z. (2022). Generative mixup networks for zero-shot learning. IEEE transactions on neural networks and learning systems
Xu, J., & Le, H. (2022). Generating representative samples for few-shot classification, in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9003–9013
Yan, C., Chang, X., Luo, M., Liu, H., Zhang, X., & Zheng, Q. (2022). Semantics-guided contrastive network for zero-shot object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence
Yao, J., Han, L., Guo, G., Zheng, Z., Cong, R., Huang, X., Ding, J., Yang, K., Zhang, D., & Han, J. (2024). Position-based anchor optimization for point supervised dense nuclei detection. Neural Networks, 171, 159–170.
Zhang, D., Zeng, W., Yao, J., & Han, J. (2020). Weakly supervised object detection using proposal-and semantic-level relationships. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6), 3349.
Zhang, D., Han, J., Cheng, G., & Yang, M. H. (2021). Weakly supervised object localization and detection: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5866–5885.
Zhang, D., Guo, G., Zeng, W., Li, L., & Han, J. (2022). Generalized weakly supervised object localization. IEEE Transactions on Neural Networks and Learning Systems
Zhang, D., Li, H., Zeng, W., Fang, C., Cheng, L., Cheng, M.M., & Han, J. (2023). Weakly supervised semantic segmentation via alternate self-dual teaching. IEEE Transactions on Image Processing
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D. N. (2018). Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1947–1962.
Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning, in: proceedings of the IEEE conference on computer vision and pattern recognition, pp 2021–2030
Zhang, L., Wang, X., Yao, L., Wu, L., & Zheng, F. (2020). Zero-shot object detection via learning an embedding from semantic space to visual space. In: Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence \(\{\)IJCAI-PRICAI-20\(\}\), International Joint Conferences on Artificial Intelligence Organization
Zhang, W., Janson, P., Yi, K., Skorokhodov, I., & Elhoseiny, M. (2023). Continual zero-shot learning through semantically guided generative random walks, in proceedings of the IEEE/CVF international conference on computer vision, pp 11574–11585
Zhang, X., Liu, Y., Dang, Y., Gao, X., Han, J., & Shao, L. (2024). Adaptive relation-aware network for zero-shot classification. Neural Networks, 174, 106227.
Zhao, S., Gao, C., Shao, Y., Li, L., Yu, C., Ji, Z., & Sang, N. (2020). Gtnet: Generative transfer network for zero-shot object detection. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12967–12974.
Zhao, X., Shen, Y., Wang, S., & Zhang, H. (2022). Boosting generative zero-shot learning by synthesizing diverse features with attribute augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 3454–3462.
Zheng, Y., Huang, R., Han, C., Huang, X., & Cui, L. (2020). Background learnable cascade for zero-shot object detection, in proceedings of the asian conference on computer vision
Zhu, P., Wang, H., & Saligrama, V. (2020). Don’t even look once: Synthesizing features for zero-shot detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11693–11702
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by Esa Rahtu.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported in part by the Key-Area Research and Development Program of Guangdong Province (No. 2021B0101200001), the National Natural Science Foundation of China (Nos. U21B2048, 62322605, 62293543, 62202015), and the Institute of Artificial Intelligence, Hefei Comprehensive National Science Center Project Grant (21KT008).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, P., Zhang, D., Cheng, D. et al. M-RRFS: A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection. Int J Comput Vis 132, 4651–4672 (2024). https://doi.org/10.1007/s11263-024-02112-9
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02112-9