A Comprehensive Review of Few-Shot Action Recognition

Wanyan, Yuyang; Yang, Xiaoshan; Dong, Weiming; Xu, Changsheng

doi:10.1007/s11263-025-02503-6

A Comprehensive Review of Few-Shot Action Recognition

Published: 28 June 2025

Volume 133, pages 6832–6859, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Yuyang Wanyan^1,2,
Xiaoshan Yang ORCID: orcid.org/0000-0001-5453-9755^1,2,3,
Weiming Dong^1,2 &
…
Changsheng Xu^1,2,3

647 Accesses
7 Citations
Explore all metrics

Abstract

Few-shot action recognition aims to address the high cost and impracticality of manually labeling complex and variable video data in action recognition. It requires accurately classifying human actions in videos using only a few labeled examples per class. Compared to few-shot learning in image scenarios, few-shot action recognition is more challenging due to the intrinsic complexity of video data. Numerous approaches have driven significant advancements in few-shot action recognition, which underscores the need for a comprehensive survey. Unlike early surveys that focus on few-shot image or text classification, we deeply consider the unique challenges of few-shot action recognition. In this survey, we provide a comprehensive review of recent methods and introduce a novel and systematic taxonomy of existing approaches, accompanied by a detailed analysis. We categorize the methods into generative-based and meta-learning frameworks, and further elaborate on the methods within the meta-learning framework, covering aspects: video instance representation, category prototype learning, and generalized video alignment. Additionally, the survey presents the commonly used benchmarks and discusses relevant advanced topics and promising future directions. We hope this survey can serve as a valuable resource for researchers, offering essential guidance to newcomers and stimulating seasoned researchers with fresh insights.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalized Many-Way Few-Shot Video Classification

Recent advances of few-shot learning methods and applications

Article 21 March 2023

Convolutional Self-attention Guided Graph Neural Network for Few-Shot Action Recognition

Data Availability

This survey introduces commonly used datasets for FSAR, summarized in Section 3.3. These publicly available datasets include HMDB (https://serre-lab.clps.brown.edu/resource/hmdb-a-largehuman-motion-database), UCF101 (https://www.crcv.ucf.edu/data/UCF101.php), Kinetics (https://github.com/cvdfoundation/kinetics-dataset), SSv2 (https://20bn.com/datasets/something-something), and EPIC-Kitchens (https://epic-kitchens.github.io/2021).

References

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt ,J., Altman, S., Anadkat, S., & et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
Akula, A., Shah, A. K., & Ghosh, R. (2018). Deep learning approach for human action recognition in infrared images. Cogn. Syst. Res., 50, 146–154.
Google Scholar
An, Y., Xue, H., Zhao, X., & Wang, J. (2023). From instance to metric calibration: A unified framework for open-world few-shot learning. IEEE Trans. Pattern Anal. Mach. Intell., 45(8), 9757–9773.
Google Scholar
Barr, P., Noble, J., & Biddle, R. (2007). Video game values: Human-computer interaction and games. Interact. Comput., 19(2), 180–195.
Google Scholar
Bayat, A., Pomplun, M., & Tran, D. A. (2014). A study on human activity recognition using accelerometer data from smartphones. Procedia Computer Science, 34, 450–457.
Google Scholar
Bishay, M., Zoumpourlis, G., & Patras, I. Tarn: Temporal attentive relation network for few-shot and zero-shot action recognition. arXiv preprint arXiv:1907.09021, 2019.
Y. Bo, Y. Lu, & W. He. Few-shot learning of video action recognition only based on video contents. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 595–604, 2020.
Boudiaf, M., Bennequin, E., Tami, M., Toubhans, A., Piantanida, P., Hudelot, C., & Ben Ayed, I. Open-set likelihood maximization for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24007–24016, 2023.
Cai, Q., Pan, Y., Yao, T., Yan, C., & Mei, T. Memory matching networks for one-shot image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4080–4088, 2018.
Calderbank, R., Jafarpour, S., & Schapire, R. Compressed learning: Universal sparse dimensionality reduction and learning in the measurement domain. preprint, 2009.
Cao, C., Li, Y., Lv, Q., Wang, P., & Zhang, Y. (2021). Few-shot action recognition with implicit temporal alignment and pair similarity optimization. Comput. Vis. Image Underst., 210, Article 103250.
Google Scholar
Cao, C., Zhang, Y., Yu, Y., Lv, Q., Min, L., & Zhang, Y. Task-adapter: Task-specific adaptation of image models for few-shot action recognition. In Proceedings of the 32nd ACM International Conference on Multimedia, pages 9038–9047, 2024.
Cao, K., Ji, J., Cao, Z., Chang, C.-Y., & Niebles, J. C. Few-shot video classification via temporal alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10618–10627, 2020.
Cao, Y., Su, X., Tang, Q., You, S., Lu, X., & Xu, C. (2022). Searching for better spatio-temporal alignment in few-shot action recognition. Adv. Neural. Inf. Process. Syst., 35, 21429–21441.
Google Scholar
Carreira, J., & Zisserman, A. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017.
Carreira, J., Noland, E., Banki-Horvath, A., Hillier, C., & Zisserman, A. A short note about kinetics-600. arXiv preprint arXiv:1808.01340, 2018.
Carreira, J., Noland, E., Hillier, C., & Zisserman, A. A short note on the kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987, 2019.
Castro, F. M., Marín-Jiménez, M. J., Guil, N., Schmid, C., & Alahari, K. End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV), pages 233–248, 2018.
Chen, Y., Chen, D., Liu, R., Li, H., & Peng, W. Video action recognition with attentive semantic units. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10170–10180, 2023.
Damen, D., Doughty, H., Farinella, G. M., Fidler, S., Furnari, A., Kazakos, E., Moltisanti, D., Munro, J., Perrett, T., Price, W., & et al. Scaling egocentric vision: The epic-kitchens dataset. In Proceedings of the European conference on computer vision (ECCV), pages 720–736, 2018.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. https://doi.org/10.1109/CVPR.2009.5206848.
Deng, L., Li, Z., Zhou, B., Chen, Z., Li, A., & Ge, Y. Two-stream joint matching method based on contrastive learning for few-shot action recognition. arXiv preprint arXiv:2401.04150, 2024.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., &et al. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
Estevam, V., Pedrini, H., & Menotti, D. (2021). Zero-shot action recognition in videos: A survey. Neurocomputing, 439, 159–175.
Google Scholar
Fathi, A., & Mori, G. Action recognition by learning mid-level motion features. In 2008 IEEE conference on computer vision and pattern recognition, pages 1–8. IEEE, 2008.
Feng, Y., Gao, J., & Xu, C. Learning dual-routing capsule graph neural network for few-shot video classification. IEEE Transactions on Multimedia, 2022.
Feng, Y., Gao, J., & Xu, C. Spatiotemporal orthogonal projection capsule network for incremental few-shot action recognition. IEEE Transactions on Multimedia, 2024.
Finn, C., Abbeel, P., & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
Fu, Y., Wang, C., Fu, Y., Wang, Y.-X., Bai, C., Xue, X., & Jiang, Y.-G. Embodied one-shot video recognition: Learning from actions of a virtual embodied agent. In Proceedings of the 27th ACM international conference on multimedia, pages 411–419, 2019.
Fu, Y., Zhang, L., Wang, J.,Fu, Y., & Jian, Y.-G. Depth guided adaptive meta-fusion network for few-shot video recognition. In Proceedings of the 28th ACM International Conference on Multimedia, pages 1142–1151, 2020.
Fukushi, K., Nozaki, Y., Nishihara, K., & Nakahara, K. Few-shot generative model for skeleton-based human action synthesis using cross-domain adversarial learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3946–3955, 2024.
Gao, T., Han, X., Liu, Z., & Sun, M. (2019). Hybrid attention-based prototypical networks for noisy few-shot relation classification. In Proceedings of the AAAI conference on artificial intelligence, 33, 6407–6414.
Google Scholar
Gao, Z., Guo, L., Guan, W., Liu, A.-A., Ren, T., & Chen, S. (2020). A pairwise attentive adversarial spatiotemporal network for cross-domain few-shot action recognition-r2. IEEE Trans. Image Process., 30, 767–782.
Google Scholar
Garnelo, M., Rosenbaum, D., Maddison, C.,Ramalho, T., Saxton, D.,Shanahan, M., Teh, Y. W., Rezende, D., & Eslami, S. A. Conditional neural processes. In International conference on machine learning, pages 1704–1713. PMLR, 2018.
Gharoun, H., Momenifar, F., Chen, F., & Gandomi, A. Meta-learning approaches for few-shot learning: A survey of recent advances. ACM Computing Surveys, 2023.
Gidaris, S., & Komodakis, N. Dynamic few-shot visual learning without forgetting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4367–4375, 2018.
Goyal, R., Ebrahimi Kahou, S., Michalski, V., Materzynska, J., Westphal, S., Kim, H., Haenel, V., Fruend, I., Yianilos, P., Mueller-Freitag, M., & et al. The" something something" video database for learning and evaluating visual common sense. In Proceedings of the IEEE international conference on computer vision, pages 5842–5850, 2017.
Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., & et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025a.
Guo, F., Wang, Y., Qi, H., Jin, W., Zhu, L., & Sun, J. (2024). Multi-view distillation based on multi-modal fusion for few-shot action recognition (clip-$\text{ m}^2$dmf). Knowl.-Based Syst., 304, Article 112539.
Google Scholar
Guo, F., Wang, Y.,Qi, H., Zhu, L., & Sun, J. Dmsd-cdfsar: Distillation from mixed-source domain for cross-domain few-shot action recognition. Expert Systems with Applications, page 126411, 2025b.
Guo, H., Yu, W., Que, S., Du, K., Yan, Y., & Wang, H. Video-to-task learning via motion-guided attention for few-shot action recognition. arXiv preprint arXiv:2411.11335, 2024b.
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., & Alahi, A. Social gan: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2255–2264, 2018.
Hara, K., Kataoka, H., & Satoh, Y. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 6546–6555, 2018.
He, J., & Gao, S. TBSN: sparse-transformer based siamese network for few-shot action recognition. In 2021 2nd Information Communication Technologies Conference (ICTC), pages 47–53. IEEE, 2021.
He, K., Zhang, X., Ren, S., & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
Hong, J., Fisher, M., Gharbi, M., & Fatahalian, K. Video pose distillation for few-shot, fine-grained sports action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9254–9263, 2021.
Hospedales, T., Antoniou, A., Micaelli, P., & Storkey, A. (2021). Meta-learning in neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 44(9), 5149–5169.
Google Scholar
Hu, W., Xie, D., Fu, Z., Zeng, W., & Maybank, S. (2007). Semantic-based surveillance video retrieval. IEEE Trans. Image Process., 16(4), 1168–1181.
MathSciNet Google Scholar
Hu, Y., Gao, J., & Xu, C. (2021). Learning dual-pooling graph neural networks for few-shot video classification. IEEE Trans. Multimedia, 23, 4285–4296.
Google Scholar
Hu, Y., Lee, C.-H., Xie, T., Yu, T., Smith, N. A., & Ostendorf, M. In-context learning for few-shot dialogue state tracking. arXiv preprint arXiv:2203.08568, 2022.
Huang, W., Zhang, J., Li, G., Zhang, L., Wang, S., Dong, F., Jin, J., Ogawa, T., & Haseyama, M. Manta: Enhancing mamba for few-shot action recognition of long sub-sequence. arXiv preprint arXiv:2412.07481, 2024a.
Huang, Y., Yang, L., & Sato, Y. Compound prototype matching for few-shot action recognition. In European Conference on Computer Vision, pages 351–368. Springer Nature Switzerland Cham, 2022.
Huang, Y., Yang, L., Chen, G., Zhang, H., Lu, F., & Sato, Y. Matching compound prototypes for few-shot action recognition. International Journal of Computer Vision, pages 1–26, 2024b.
Innocenti, S. U., Becattini, F., Pernici, F., & Del Bimbo, A. Temporal binary representation for event-based action recognition. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 10426–10432. IEEE, 2021.
Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell., 35(1), 221–231.
Google Scholar
Jiang, L., Yu, J., Dang, Y., Chen, P., & Huan, R. (2023). HiTIM: Hierarchical task information mining for few-shot action recognition. Appl. Sci., 13(9), 5277.
Google Scholar
Jiang, L., Zhan, Y., Jiang, Z., & Tang, N. A dual-prototype network combining query-specific and class-specific attentive learning for few-shot action recognition. Neurocomputing, page 127819, 2024.
Kahatapitiya, K., Arnab, A., Nagrani, A., & Ryoo, M. S. Victr: Video-conditioned text representations for activity recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18547–18558, 2024.
Kay, W., Carreira, J., Simonyan, K., Zhang, B.,Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P., & et al. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950, 2017.
Kim, M., Han, D., Kim, T., & Han, B. Leveraging temporal contextualization for video action recognition. arXiv preprint arXiv:2404.09490, 2024.
Kong, Y., & Fu, Y. (2022). Human action recognition and prediction: A survey. Int. J. Comput. Vision, 130(5), 1366–1401.
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. HMDB: a large video database for human motion recognition. In 2011 International conference on computer vision, pages 2556–2563. IEEE, 2011.
Kumar, N., & Narang, S. Few-shot activity recognition using variational inference. arXiv preprint arXiv:2108.08990, 2021.
Kumar, P., Padmanabhan, N., Luo, L., Rambhatla, S. S., & Shrivastava, A. Trajectory-aligned space-time tokens for few-shot action recognition. In European Conference on Computer Vision, pages 474–493. Springer, 2024.
Kumar Dwivedi, S.,Gupta, V ., Mitra, R., Ahmed, S., & Jain, A. Protogan: Towards few shot learning for action recognition. In Proceedings of the IEEE/CVF international conference on computer vision workshops, pages 0–0, 2019.
Li, B., Liu, M., Wang, G., & Yu, Y.. Frame order matters: A temporal sequence-aware model for few-shot action recognition. arXiv preprint arXiv:2408.12475, 2024a.
Li, C., Zhang, J., Wu, S., Jin, X., & Shan, S. (2024). Hierarchical compositional representations for few-shot action recognition. Comput. Vis. Image Underst., 240, Article 103911.
Google Scholar
Li, J., Liu, X., Zhang, W., Zhang, M., Song, J., & Sebe, N. (2020). Spatio-temporal attention networks for action recognition and detection. IEEE Trans. Multimedia, 22(11), 2990–3001.
Google Scholar
Li, S., Liu, H., Fei, M., Yu, X., Lin, W., & Cloud, H. Temporal alignment via event boundary for few-shot action recognition. In BMVC, page 184, 2021.
Li, S., Liu, H., Qian, R., Li, Y., See, J., Fei, M., Yu, X., & Lin, W. (2022). TA2N: Two-stage action alignment network for few-shot action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, 36, 1404–1411.
Google Scholar
Li, X., Yang, X., Ma, Z., & Xue, J.-H. (2023). Deep metric learning for few-shot image classification: A review of recent developments. Pattern Recogn., 138, Article 109381.
Google Scholar
Li, Y., Chen, G., Abramowitz, B., Anzellott, S., & Wei, D. Learning domain-invariant temporal dynamics for few-shot action recognition. arXiv preprint arXiv:2402.12706, 2024c.
Li, Z., Gong, X., Song, R., Duan, P., Liu, J., & Zhang, W. (2022). SMAM: Self and mutual adaptive matching for skeleton-based few-shot action recognition. IEEE Trans. Image Process., 32, 392–402.
Google Scholar
Liang, D., & Thomaz, E. (2019). Audio-based activities of daily living (adl) recognition with large-scale acoustic embeddings from online videos. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(1), 1–18.
Google Scholar
Lin, A., & Ling, H. (2007). Doppler and direction-of-arrival (ddoa) radar for multiple-mover sensing. IEEE Trans. Aerosp. Electron. Syst., 43(4), 1496–1509.
Google Scholar
Liu, B., Kang, H., Li, H., Hua, G., & Vasconcelos, N. Few-shot open-set recognition using meta-learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8798–8807, 2020a.
Liu, B., Zheng, T., Zheng, P., Liu, D., Qu, X., Gao, J., Dong, J., & Wang, X. Lite-MKD: A multi-modal knowledge distillation framework for lightweight few-shot action recognition. In Proceedings of the 31st ACM International Conference on Multimedia, MM ’23, page 7283–7294, New York, NY, USA, 2023a. Association for Computing Machinery. ISBN 979840070108https://doi.org/10.1145/3581783.3612279. URL https://doi.org/10.1145/3581783.3612279.
Liu, H., Lv, W., See, J., & Lin, W. Task-adaptive spatial-temporal video sampler for few-shot action recognition. In Proceedings of the 30th ACM International Conference on Multimedia, pages 6230–6240, 2022a.
Liu, H.,Li, C., Wu, Q., & Lee, Y. J. Visual instruction tuning, 2023b.
Liu, S., Jiang, M., & Kong, J. (2022). Multidimensional prototype refactor enhanced network for few-shot action recognition. IEEE Trans. Circuits Syst. Video Technol., 32(10), 6955–6966.
Google Scholar
Liu, X., Zhang, H., & Pirsiavash, H. MASTAF: a model-agnostic spatio-temporal attention fusion network for few-shot video classification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2508–2517, 2023c.
Liu, X., Zhou, S., Wang, L., & Hua, G. Parallel attention interaction network for few-shot skeleton-based action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1379–1388, 2023d.
Liu, Y., Schiele, B., & Sun, Q. An ensemble of epoch-wise empirical bayes for few-shot learning. In European Conference on Computer Vision, pages 404–421. Springer, 2020b.
Liu, Z., Ning, J., Cao, Y., Wei, Y., Zhang, Z., Lin, S., & Hu, H. Video swin transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3202–3211, 2022c.
Lu, M., Yang, S.,Lu, X., & Liu, J. Cross-modal contrastive pre-training for few-shot skeleton action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 2024.
Luo, W., Liu, Y., Li, B., Hu, W., Miao, Y., & Li, Y. Long-short term cross-transformer in compressed domain for few-shot video classification. In IJCAI, pages 1247–1253, 2022.
Ma, N., Bu, J., Yang, J., Zhang, Z., Yao, C., Yu, Z., Zhou, S., & Yan, X. Adaptive-step graph meta-learner for few-shot graph classification. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 1055–1064, 2020.
Ma, N., Zhang, H., Li, X., Zhou, S., Zhang, Z., Wen, J., Li, H., Gu, J., & Bu, J. Learning spatial-preserved skeleton representations for few-shot action recognition. In European Conference on Computer Vision, pages 174–191. Springer, 2022.
Majumder, S., Chen, C., Al-Halah, Z., & Grauman, K. (2022). Few-shot audio-visual learning of environment acoustics. Adv. Neural. Inf. Process. Syst., 35, 2522–2536.
Google Scholar
Markham, G., Balamurali, M., & Hill, A. J. Understanding the cross-domain capabilities of video-based few-shot action recognition models. arXiv preprint arXiv:2406.01073, 2024.
Memmesheimer, R., Häring, S., Theisen, N., & Paulus, D. Skeleton-dml: Deep metric learning for skeleton-based one-shot action recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3702–3710, 2022.
Mercea, O.-B., Hummel, T., Koepke, A. S., & Akata, Z. Text-to-feature diffusion for audio-visual few-shot learning. In DAGM German Conference on Pattern Recognition, pages 491–507. Springer, 2023.
Mikolov, T., Chen, K.,Corrado, G., & Dean, J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
Min, Y., Zhang, Y., Chai, X., & Chen, X. An efficient pointlstm for point clouds based gesture recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5761–5770, 2020.
Mishra, A., Verma, V. K., Reddy, M. S. K., Arulkumar, S., Rai, P., & Mittal, A. A generative approach to zero-shot and few-shot action recognition. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 372–380. IEEE, 2018.
Müller, M. Dynamic time warping. Information retrieval for music and motion, pages 69–84, 2007.
Nguyen, K. D., Tran, Q.-H., Nguyen, K., Hua B.-S.,, & Nguyen, R. Inductive and transductive few-shot video classification via appearance and temporal alignments. In European Conference on Computer Vision, pages 471–487. Springer, 2022.
Ni, X., Liu, Y., Wen, H., Ji, Y., Xiao, J., & Yang, Y. Multimodal prototype-enhanced network for few-shot action recognition. arXiv preprint arXiv:2212.04873, 2024.
Patravali, J., Mittal, G., Yu, Y., Li, F., & Chen, M. Unsupervised few-shot action recognition via action-appearance aligned meta-adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8484–8494, 2021.
Pei, W., Tan, Q., Lu, G., & Tian, J. $\text{ D}^2$ST-Adapter: Disentangled-and-deformable spatio-temporal adapter for few-shot action recognition. arXiv preprint arXiv:2312.01431, 2023.
Peng, K., Wen, D., Schneider, D., Zhang, J., Yang, K., Sarfraz, M. S., Stiefelhagen, R., & Roitberg, A. Exploring few-shot adaptation for activity recognition on diverse domains. arXiv preprint arXiv:2305.08420, 2023.
Perez-Rua, J.-M., Zhu, X., Hospedales, T. M., & Xiang, T. Incremental few-shot object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13846–13855, 2020.
Perrett, T., Masullo, A., Burghardt, T., Mirmehdi, M., & Damen, D. Temporal-relational crosstransformers for few-shot action recognition. pages 475–484, 2021.
Qi, M., Qin, J., Zhen, X., Huang, D., Yang, Y., & Luo, J. Few-shot ensemble learning for video classification with slowfast memory networks. In Proceedings of the 28th ACM international conference on multimedia, pages 3007–3015, 2020.
Qiao, S., Liu, C., Shen, W., & Yuille, A. L. Few-shot image recognition by predicting parameters from activations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7229–7238, 2018.
Qu, H., Yan, R., Shu, X., Gao, H., Huang, P., & Xie, G.-S. MVP-Shot: Multi-velocity progressive-alignment framework for few-shot action recognition. arXiv preprint arXiv:2405.02077, 2024.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., & et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
Roh, M.-C., Shin, H.-K., & Lee, S.-W. (2010). View-independent human action recognition with volume motion template on single stereo camera. Pattern Recogn. Lett., 31(7), 639–647.
Google Scholar
Ruan, Z., Wei, Y., Yuan, Y., Li, Y., Guo, Y., & Xie, Y. Advances in few-shot action recognition: A comprehensive review. In 2024 7th International conference on artificial intelligence and big data (ICAIBD), pages 390–398. IEEE, 2024.
Sabater, A., Santos, L., Santos-Victor, J., Bernardino, A., Montesano, L., & Murillo, A. C. One-shot action recognition in challenging therapy scenarios. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2777–2785, 2021.
Saleem, G., Bajwa, U. I., & Raza, R. H. (2023). Toward human activity recognition: a survey. Neural Comput. Appl., 35(5), 4145–4182.
Google Scholar
Samarasinghe, S., Rizve, M. N., Kardan, N., & Shah, M. Cdfsl-v: Cross-domain few-shot learning for videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11643–11652, 2023.
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. Meta-learning with memory-augmented neural networks. In International conference on machine learning, pages 1842–1850. PMLR, 2016.
Schick, T., & Schütze, H. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 255–269, 2021.
Shao, S., Bai, Y., Wang, Y., Liu, B., & Liu, B. (2024). Collaborative consortium of foundation models for open-world few-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 38, 4740–4747.
Google Scholar
Shi, Y., Wu, X., & Lin, H. Knowledge prompting for few-shot action recognition. arXiv preprint arXiv:2211.12030, 2022.
Shi, Y., Wu, X., Lin, H., & Luo, J. Commonsense knowledge prompting for few-shot action recognition in videos. IEEE Transactions on Multimedia, 2024.
Si, C., Chen, W., Wang, W., Wang, L., & Tan, T. An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1227–1236, 2019.
Simon, C., Koniusz, P., Nock, R., & Harandi, M. Adaptive subspaces for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4136–4145, 2020.
Simonyan, K., & Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, & K. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper_files/paper/2014/file/00ec53c4682d36f5c4359f4ae7bd7ba1-Paper.pdf.
Snell, J., Swersky, K., & Zemel, R. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
Song, Y., Wang, T., Cai, P., Mondal, S. K., & Sahoo, J. P. A comprehensive survey of few-shot learning: Evolution, applications, challenges, & opportunities. ACM Comput. Surv., 55(13s), jul 2023. ISSN 0360-030https://doi.org/10.1145/3582688. URL https://doi.org/10.1145/3582688.
Soomro, K., & Zamir, A. R. Action recognition in realistic sports videos. In Computer vision in sports, pages 181–208. Springer, 2015.
Soomro, K., Zamir, A. R., & Shah, M. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
Su, B., Hua, G. Order-preserving wasserstein distance for sequence matching. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1049–1057, 2017.
Sun, Z., Ke, Q., Rahmani, H., Bennamoun, M., Wang, G., & Liu, J. (2022). Human action recognition from various data modalities: A review. IEEE Trans. Pattern Anal. Mach. Intell., 45(3), 3200–3225.
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
Tang, H., Liu, J., Yan, S., Yan, R., Li, Z., & Tang, J. M3net: multi-view encoding, matching, & fusion for few-shot fine-grained action recognition. In Proceedings of the 31st ACM international conference on multimedia, pages 1719–1728, 2023.
Tang, Y., Béjar, B., & Vidal, R. Semantic-aware video representation for few-shot action recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6458–6468, 2024.
Tao, X., Hong, X., Chang, X., Dong, S., Wei, X., & Gong, Y. Few-shot class-incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12183–12192, 2020.
Taylor, G. W., Fergus, R., LeCun, Y., & Bregler, C. Convolutional learning of spatio-temporal features. In Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part VI 11, pages 140–153. Springer, 2010.
Thatipelli, A., Narayan, S., Khan, S., Anwer, R. M., Khan, F. S., & Ghanem, B. Spatio-temporal relation modeling for few-shot action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19958–19967, 2022.
Thomee, B., Shamma, D. A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D., & Li, L.-J. (2016). Yfcc100m: The new data in multimedia research. Commun. ACM, 59(2), 64–73.
Google Scholar
Tong, Z., Song, Y., Wang, J., & Wang, L. (2022). Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Adv. Neural. Inf. Process. Syst., 35, 10078–10093.
Google Scholar
Touvron, H.,Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F. & et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 4489–4497, 2015.
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 6450–6459, 2018.
Tseng, M.-R., Gupta, A., Tang, C.-K., & Tai, Y.-W. HAA4D: few-shot human atomic action recognition via 3d spatio-temporal skeletal alignment. arXiv preprint arXiv:2202.07308, 2022.
Tu, N. A., Abu, A., Aikyn, N., Makhanov, N., Lee, M.-H., Le-Huy, K., & Wong, K.-S. FedFSLAR: A federated learning framework for few-shot action recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 270–279, 2024.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. Attention is all you need. Advances in neural information processing systems, 30, 2017.
Vemulapalli, R., Arrate, F., & Chellappa, R. Human action recognition by representing 3d skeletons as points in a lie group. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 588–595, 2014.
Vettoruzzo, A., Bouguelia, M.-R., Vanschoren, J., Rognvaldsson, T., & Santosh, K. Advances and challenges in meta-learning: A technical review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., & et al. Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016.
Walker, J., Gupta, A., & Hebert, M. Patch to the future: Unsupervised visual prediction. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 3302–3309, 2014.
Wang, C., & Zhou, Y. A survey of few-shot action recognition. Journal of Artificial Intelligence Practice, 6 (1):34–40, 2023. https://doi.org/10.23977/jaip.2023.060105.
Wang, G., Ye, H., Wang, X., Ye, W., & Wang, H. Temporal relation based attentive prototype network for few-shot action recognition. In Asian Conference on Machine Learning, pages 406–421. PMLR, 2021a.
Wang, J., Liu, Z., Wu, Y., & Yuan, J. Mining actionlet ensemble for action recognition with depth cameras. In 2012 IEEE conference on computer vision and pattern recognition, pages 1290–1297. IEEE, 2012.
Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2013). Learning actionlet ensemble for 3d human action recognition. IEEE Trans. Pattern Anal. Mach. Intell., 36(5), 914–927.
Google Scholar
Wang, J., Wang, Y., Liu, S., & Li, A. Few-shot fine-grained action recognition via bidirectional attention and contrastive meta-learning. In Proceedings of the 29th ACM International Conference on Multimedia, pages 582–591, 2021b.
Wang, L., & Koniusz, P. Temporal-viewpoint transportation plan for skeletal few-shot action recognition. In Proceedings of the Asian Conference on Computer Vision, pages 4176–4193, 2022.
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision, pages 20–36. Springer, 2016.
Wang, L., Tong, Z., Ji, B., & Wu, G. Tdn: Temporal difference networks for efficient action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1895–1904, 2021c.
Wang, P., Bai, S., Tan, S., Wang, S., Fan, Z., Bai, J., Chen, K., Liu, X., Wang, J., Ge, W., & et al. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution. arXiv preprint arXiv:2409.12191, 2024a.
Wang, Q., Du, J., Yan, K., & Ding, S. Seeing in flowing: Adapting clip for action recognition with motion prompts learning. In Proceedings of the 31st ACM International Conference on Multimedia, MM ’23, page 5339–5347, New York, NY, USA, 2023a. Association for Computing Machinery. ISBN 979840070108https://doi.org/10.1145/3581783.3612490. URL https://doi.org/10.1145/3581783.3612490.
Wang, X., Ye, W., Qi, Z., Zhao, X., Wang, G., Shan, Y., & Wang,H. Semantic-guided relation propagation network for few-shot action recognition. In Proceedings of the 29th ACM International Conference on Multimedia, pages 816–825, 2021d.
Wang, X., Zhang, S., Qing, Z., Tang, M., Zuo, Z., Gao, C., Jin, R., & Sang, N. Hybrid relation guided set matching for few-shot action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19948–19957, 2022.
Wang, X., Ye, W., Qi, Z., Wang, G., Wu, J., Shan, Y., Qie, X., & Wang, H. Task-aware dual-representation network for few-shot action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 2023b.
Wang, X., Zhang, S., Cen, J., Gao, C., Zhang, Y., Zhao, D., & Sang, N. CLIP-guided prototype modulating for few-shot action recognition. International Journal of Computer Vision, pages 1–14, 2023c.
Wang, X., Zhang, S., Qing, Z., Gao, C., Zhang, Y., Zhao, D., & Sang, N. Molo: Motion-augmented long-short contrastive learning for few-shot action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18011–18021, 2023d.
Wang, X., Lu, Y., Yu, W., Pang, Y., & Wang, H. Few-shot action recognition via multi-view representation learning. IEEE Transactions on Circuits and Systems for Video Technology, 2024b.
Wang, X., Zhang, S., Qing, Z., Zuo, Z., Gao, C., Jin, R., & Sang, N. (2024). HyRSM++: Hybrid relation guided temporal set matching for few-shot action recognition. Pattern Recogn., 147, Article 110110.
Google Scholar
Wang, Y., Yao, Q., Kwok, J. T., & Ni, L. M. (2020). Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv., 53(3), 1–34.
Google Scholar
Wang, Y., Bryan, N. J., Cartwright, M., Bello, J. P., & Salamon,J. Few-shot continual learning for audio classification. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 321–325. IEEE, 2021e.
Wang, Y., Gao, Z., Wang, Q., Chen, Z., Li, P., & Hu, Q. Tamt: Temporal-aware model tuning for cross-domain few-shot action recognition. arXiv preprint arXiv:2411.19041, 2024d.
Wanyan, Y., Yang, X., Chen, C., & Xu, C. Active exploration of multimodal complementarity for few-shot action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6492–6502, 2023.
Wei, C., & Deng, Z. A novel contrastive diffusion graph convolutional network for few-shot skeleton-based action recognition. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5780–5784. IEEE, 2024.
Wu, A., & Ding, S. (2023). Reconstructed prototype network combined with cdc-tagcn for few-shot action recognition. Appl. Sci., 13(20), 11199.
Google Scholar
Wu, C., Wu, X.-J., Li, L., Xu, T., Feng, Z., & Kittler, J. Efficient few-shot action recognition via multi-level post-reasoning. In European Conference on Computer Vision, pages 38–56. Springer, 2024.
Wu, J., Zhang, T., Zhang, Z., Wu, F., & Zhang, Y. Motion-modulated temporal fragment alignment network for few-shot action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9151–9160, 2022.
Wu, Y., Chen, Y., Wang, L., Ye, Y., Liu, Z., Guo, Y., & Fu, Y. Large scale incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 374–382, 2019.
Xia, H., Li, K., Min, M. R., & Ding, Z. Few-shot video classification via representation fusion and promotion learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19311–19320, 2023.
Xian, Y., Korbar, B., Douze, M., Schiele, B., Akata, Z., & Torresani, L. Generalized many-way few-shot video classification. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16, pages 111–127. Springer, 2020.
Xian, Y., Korbar, B., Douze, M., Torresani, L., Schiele, B., & Akata, Z. (2021). Generalized few-shot video classification with video retrieval and feature generation. IEEE Trans. Pattern Anal. Mach. Intell., 44(12), 8949–8961.
Google Scholar
Xiao, J., Xiang, T., & Tu, Z. Adaptive prototype model for attribute-based multi-label few-shot action recognition. arXiv preprint arXiv:2502.12582, 2025.
Xing, J., Wang, M., Hou, X., Dai, G., Wang, J., & Liu, Y. Multimodal adaptation of clip for few-shot action recognition. arXiv e-prints, pages arXiv–2308, 2023a.
Xing, J., Wang, M., Liu, Y., & Mu, B. (2023). Revisiting the spatial and temporal modeling for few-shot action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, 37, 3001–3009.
Google Scholar
Xing, J., Wang, M., Ruan, Y., Chen, B., Guo, Y., Mu, B., Dai, G., Wang, J., & Liu, Y. Boosting few-shot action recognition with graph-guided hybrid matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1740–1750, 2023c.
Xu, B., Ye, H., Zheng, Y., Wang, H., Luwang, T., Jiang, Y.-G. Dense dilated network for few shot action recognition. In Proceedings of the 2018 ACM on international conference on multimedia retrieval, pages 379–387, 2018.
Xu, Q., Yang, J., Zhang, H., Jie, X., & Bandara, D. Enhancing few-shot action recognition using skeleton temporal alignment and adversarial training. IEEE Access, 2024.
Xu, Y., Yang, J., Zhou, Y., Chen, Z., Wu, M., & Li, X. Augmenting and aligning snippets for few-shot video domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13445–13456, 2023.
Yan, K., Zhang, C., Hou, J., Wang, P., Bouraoui, Z., Jameel, S., & Schockaert, S. (2022). Inferring prototypes for multi-label few-shot image classification with word vector guided attention. In Proceedings of the AAAI Conference on Artificial Intelligence, 36, 2991–2999.
Google Scholar
Yang, F., Wang, R., & Chen, X. SEGA: semantic guided attention on visual prototype for few-shot learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1056–1066, 2022.
Yang, S., Liu, L., & Xu, M. Free lunch for few-shot learning: Distribution calibration. In International Conference on Learning Representations, 2021.
Yang, S., Liu, J., Lu, S., Hwa, E. M., & Kot, A. C. One-shot action recognition via multi-scale spatial-temporal skeleton matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
Yu, M., Guo, X., Yi, J., Chang, S., Potdar, S., Cheng, Y., Tesauro, G., Wang, H., & Zhou, B. Diverse few-shot text classification with multiple metrics. arXiv preprint arXiv:1805.07513, 2018.
Yu, T., Chen, P., Dang, Y., Huan, R., & Liang, R. Multi-speed global contextual subspace matching for few-shot action recognition. In Proceedings of the 31st ACM International Conference on Multimedia, pages 2344–2352, 2023.
Zhang, H., Zhang, L., Qi, X., Li, H., Torr, P. H., & Koniusz,P. Few-shot action recognition with permutation-invariant attention. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pages 525–542. Springer International Publishing, 2020a.
Zhang, H., Li, H., & Koniusz, P. Multi-level second-order few-shot learning. IEEE Transactions on Multimedia, 2022.
Zhang, H., Li, X., & Bing, L. Video-llama: An instruction-tuned audio-visual language model for video understanding. arXiv preprint arXiv:2306.02858, 2023a.
Zhang, L., Chang, X., Liu, J., Luo, M., Prakash, M., & Hauptmann, A. G. (2020). Few-shot activity recognition with cross-modal memory network. Pattern Recogn., 108, Article 107348.
Google Scholar
Zhang, R., Che, T., Ghahramani, Z., Bengio, Y., & Song, Y. Metagan: An adversarial approach to few-shot learning. Advances in neural information processing systems, 31, 2018.
Zhang, S., Zhou, J., & He, X. Learning implicit temporal alignment for few-shot video classification. arXiv preprint arXiv:2105.04823, 2021.
Zhang, Y., Fu, Y., Ma, X., Qi, L., Chen, J., Wu, Z., & Jiang, Y.-G. On the importance of spatial relations for few-shot action recognition. In Proceedings of the 31st ACM International Conference on Multimedia, pages 2243–2251, 2023b.
Zheng, S., Chen, S., & Jin, Q. Few-shot action recognition with hierarchical matching and contrastive learning. In European Conference on Computer Vision, pages 297–313. Springer, 2022.
Zhu, L., & Yang, Y. Compound memory networks for few-shot video classification. In Proceedings of the European Conference on Computer Vision (ECCV), pages 751–766, 2018.
Zhu, L., & Yang, Y. (2020). Label independent memory for semi-supervised few-shot video classification. IEEE Trans. Pattern Anal. Mach. Intell., 44(1), 273–285.
Google Scholar
Zhu, X., Toisoul, A., Pérez-Rúa, J.-M., Zhang, L., Martinez, B., & Xiang, T. Few-shot action recognition with prototype-centered attentive learning. arXiv preprint arXiv:2101.08085, 2021.
Zou, H., Yang, J., Prasanna Das, H., Liu, H., Zhou, Y., & Spanos, C. J. Wifi and vision multimodal learning for accurate and robust device-free human activity recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 0–0, 2019.
Zou, Y., Shi, Y., Shi, D., Wang, Y., Liang, Y., & Tian, Y. (2020). Adaptation-oriented feature projection for one-shot action recognition. IEEE Trans. Multimedia, 22(12), 3166–3179.
Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grants U23A20387, 62322212, in part by Pengcheng Laboratory Research Project under Grant PCL2023A08, in part by Alibaba Innovative Research Program, and also in part by CAS Project for Young Scientists in Basic Research (YSBR-116).

Author information

Authors and Affiliations

State Key Laboratory of Multimodal Artificial Intelligence System, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Yuyang Wanyan, Xiaoshan Yang, Weiming Dong & Changsheng Xu
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 101408, China
Yuyang Wanyan, Xiaoshan Yang, Weiming Dong & Changsheng Xu
Peng Cheng Laboratory, Shenzhen, 518066, China
Xiaoshan Yang & Changsheng Xu

Authors

Yuyang Wanyan
View author publications
Search author on:PubMed Google Scholar
Xiaoshan Yang
View author publications
Search author on:PubMed Google Scholar
Weiming Dong
View author publications
Search author on:PubMed Google Scholar
Changsheng Xu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Xiaoshan Yang.

Additional information

Communicated by Yoichi Sato.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wanyan, Y., Yang, X., Dong, W. et al. A Comprehensive Review of Few-Shot Action Recognition. Int J Comput Vis 133, 6832–6859 (2025). https://doi.org/10.1007/s11263-025-02503-6

Download citation

Received: 05 September 2024
Accepted: 09 June 2025
Published: 28 June 2025
Version of record: 28 June 2025
Issue date: October 2025
DOI: https://doi.org/10.1007/s11263-025-02503-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Review of Few-Shot Action Recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Generalized Many-Way Few-Shot Video Classification

Recent advances of few-shot learning methods and applications

Convolutional Self-attention Guided Graph Neural Network for Few-Shot Action Recognition

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now