Dual-Space Video Person Re-identification

Leng, Jiaxu; Kuang, Changjiang; Li, Shuang; Gan, Ji; Chen, Haosheng; Gao, Xinbo

doi:10.1007/s11263-025-02350-5

Dual-Space Video Person Re-identification

Published: 27 January 2025

Volume 133, pages 3667–3688, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Jiaxu Leng ORCID: orcid.org/0000-0003-2802-8139^1,2,
Changjiang Kuang^1,2,
Shuang Li^1,2,
Ji Gan^1,2,
Haosheng Chen^1,2 &
…
Xinbo Gao^1,2

700 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

Video person re-identification (VReID) aims to recognize individuals across video sequences. Existing methods primarily use Euclidean space for representation learning but struggle to capture complex hierarchical structures, especially in scenarios with occlusions and background clutter. In contrast, hyperbolic space, with its negatively curved geometry, excels at preserving hierarchical relationships and enhancing discrimination between similar appearances. Inspired by these, we propose Dual-Space Video Person Re-Identification (DS-VReID) to utilize the strength of both Euclidean and hyperbolic geometries, capturing the visual features while also exploring the intrinsic hierarchical relations, thereby enhancing the discriminative capacity of the features. Specifically, we design the Dynamic Prompt Graph Construction (DPGC) module, which uses a pre-trained CLIP model with learnable dynamic prompts to construct 3D graphs that capture subtle changes and dynamic information in video sequences. Building upon this, we introduce the Hyperbolic Disentangled Aggregation (HDA) module, which addresses long-range dependency modeling by decoupling node distances and integrating adjacency matrices, capturing detailed spatial-temporal hierarchical relationships. Extensive experiments on benchmark datasets demonstrate the superiority of DS-VReID over state-of-the-art methods, showcasing its potential in complex VReID scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Discriminative Video Representations Using Adversarial Perturbations

Unsupervised Hyperbolic Action Recognition

Temporal Extension Topology Learning for Video-Based Person Re-identification

Data Availability

These datasets for this study can be found in the following: MARS: http://zheng-lab.cecs.anu.edu.au/Project/project_mars.html iLIDS-VID: https://xiatian-zhu.github.io/downloads_qmul_iLIDS-VID_ReID_dataset.html PRID2011: https://www.tugraz.at/institute/icg/research/team-bischof/lrs/ downloads/PRID11/ DukeMTMC-VideoReID: http://vision.cs.duke.edu/DukeMTMC LS-VID: The dataset generated during and/or analysed during the current study are not publicly available due to [LS-VID RELEASE AGREEMENT].

References

Bachmann, G., Bécigneul, G., & Ganea, O. (2020). Constant curvature graph convolutional networks. In International conference on machine learning, PMLR (pp. 486–496).
Bai, S., Ma, B., Chang, H., et al. (2022). Salient-to-broad transition for video person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7339–7348).
Chami, I., Ying, Z., Ré, C., et al. (2019). Hyperbolic graph convolutional neural networks. In Advances in neural information processing systems, Vol. 32.
Chen, Z., Zhou, Z., Huang, J., et al. (2020). Frame-guided region-aligned representation for video person re-identification. In Proceedings of the AAAI conference on artificial intelligence (pp. 10591–10598).
Chen, S., Da, H., Wang, D. H., et al. (2023). Hasi: Hierarchical attention-aware Spatio-temporal interaction for video-based person re-identification. IEEE Transactions on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2023.3340428
Article Google Scholar
Eom, C., Lee, G., Lee, J., et al. (2021). Video-based person re-identification with spatial and temporal memory networks. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 12036–12045).
Ermolov, A., Mirvakhabova, L., Khrulkov, V., et al. (2022). Hyperbolic vision transformers: Combining improvements in metric learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7409–7419).
Fu, Y., Wang, X., Wei, Y., et al. (2019). Sta: Spatial-temporal attention for large-scale video-based person re-identification. In Proceedings of the AAAI conference on artificial intelligence (pp. pp 8287–8294).
Ganea, O., Bécigneul, G., & Hofmann, T. (2018). Hyperbolic neural networks. In Advances in neural information processing systems, Vol. 31.
Gu, X., Chang, H., Ma, B., et al. (2020). Appearance-preserving 3d convolution for video-based person re-identification. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pp. 228–243. Springer.
Gu, X., Chang, H., Ma, B., et al. (2022). Motion feature aggregation for video-based person re-identification. IEEE Transactions on Image Processing, 31, 3908–3919.
Article Google Scholar
He, T., Jin, X., Shen, X., et al. (2021). Dense interaction learning for video-based person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1490–1501).
He, K., Zhang, X., Ren, S., et al. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
He, S., Chen, W., Wang, K., et al. (2023). Region generation and assessment network for occluded person re-identification. IEEE Transactions on Information Forensics and Security. https://doi.org/10.1109/TIFS.2023.3318956
Article Google Scholar
Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv:1703.07737
Hirzer, M,. Beleznai, C., Roth, P.M., et al. (2011). Person re-identification by descriptive and discriminative classification. In Image analysis: 17th scandinavian conference, SCIA 2011, Ystad, Sweden, May 2011. Proceedings 17 (pp. 91–102). Springer.
Hou, R., Chang, H., Ma, B., et al. (2020). Temporal complementary learning for video person re-identification. In Computer Vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16 (pp. 388–405). Springer.
Hou, R., Chang, H., Ma, B., et al. (2021) Bicnet-tks: Learning efficient spatial-temporal representation for video person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2014–2023).
Hou, R., Ma, B., Chang, H., et al. (2019). Vrstc: Occlusion-free video person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7183–7192).
Huang, Y., Fu, X., Li, L., et al. (2022). Learning degradation-invariant representation for robust real-world person re-identification. International Journal of Computer Vision, 130(11), 2770–2796.
Article Google Scholar
Kingma, D.P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980
Kipf, T.N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv:1609.02907
Law, M., Liao, R., Snell, J., et al. (2019). Lorentzian distance learning for hyperbolic representations. In International conference on machine learning, PMLR (pp. 3672–3681).
Leng, J., Wang, H., Gao, X., et al. (2023). Where to look: Multi-granularity occlusion aware for video person re-identification. Neurocomputing, 536, 137–151.
Article Google Scholar
Li, S., Bak, S., Carr, P., et al. (2018). Diversity regularized spatiotemporal attention for video-based person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 369–378).
Li, S., Sun, L., Li, Q. (2023). Clip-reid: Exploiting vision-language model for image re-identification without concrete text labels. In Proceedings of the AAAI conference on artificial intelligence (pp. 1405–1413).
Li, J., Wang, J., Tian, Q., et al. (2019a). Global-local temporal representations for video person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3958–3967).
Li, S., Yu, H., & Hu, H. (2020b). Appearance and motion enhancement for video-based person re-identification. In Proceedings of the AAAI conference on artificial intelligence (pp. 11394–11401).
Li, J., Zhang, S., & Huang, T. (2019b). Multi-scale 3d convolution network for video based person re-identification. In Proceedings of the AAAI conference on artificial intelligence (pp. 8618–8625).
Liu, Q., Nickel, M., & Kiela, D. (2019a). Hyperbolic graph neural networks. In Advances in neural information processing systems, Vol. 32.
Liu, Y., Yuan, Z., Zhou, W., et al. (2019b). Spatial and temporal mutual promotion for video-based person re-identification. In Proceedings of the AAAI conference on artificial intelligence (pp. 8786–8793).
Liu, J., Zha, Z.J., Wu, W., et al. (2021a). Spatial-temporal correlation and topology learning for person re-identification in videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4370–4379).
Liu, X., Zhang, P., Yu, C., et al. (2021b). Watching you: Global-guided reciprocal learning for video-based person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13334–13343).
Liu, L., Yang, X., Wang, N., et al. (2023). Frequency information disentanglement network for video-based person re-identification. IEEE Transactions on Image Processing. https://doi.org/10.1109/TIP.2023.3296901
Article Google Scholar
Liu, X., Yu, C., Zhang, P., et al. (2023). Deeply coupled convolution-transformer with spatial-temporal complementary learning for video-based person re-identification. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2023.3271353
Article Google Scholar
Liu, X., Zhang, P., Yu, C., et al. (2024). A video is worth three views: Trigeminal transformers for video-based person re-identification. IEEE Transactions on Intelligent Transportation Systems. https://doi.org/10.1109/TITS.2024.3386914
Article Google Scholar
Li, P., Xu, Y., Wei, Y., et al. (2020). Self-correction for human parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6), 3260–3271.
Article Google Scholar
Long, T., Mettes, P., Shen, H.T., et al. (2020). Searching for actions on the hyperbole. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1141–1150).
Nickel, M., & Kiela, D. (2017). Poincaré embeddings for learning hierarchical representations. In Advances in neural information processing systems, Vol. 30.
Pan, H., Bai, Y., He, Z., et al. (2022). Aagcn: Adjacency-aware graph convolutional network for person re-identification. Knowledge-Based Systems, 236, 107300.
Article Google Scholar
Pan, H., Liu, Q., Chen, Y., et al. (2023). Pose-aided video-based person re-identification via recurrent graph convolutional network. IEEE Transactions on Circuits and Systems for Video Technology, 33(12), 7183–7196.
Article Google Scholar
Park, J., Cho, J., Chang, H.J., et al. (2021). Unsupervised hyperbolic representation learning via message passing auto-encoders. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5516–5526).
Peng, W., Shi, J., Xia, Z., et al. (2020). Mix dimension in Poincaré geometry for 3d skeleton-based action recognition. In Proceedings of the 28th ACM international conference on multimedia (pp. 1432–1440).
Peng, W., Varanka, T., Mostafa, A., et al. (2021). Hyperbolic deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 10023–10044.
Article Google Scholar
Radford, A., Kim, J.W., Hallacy, C., et al. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning, PMLR (pp. 8748–8763).
Ren, S., He, K., Girshick, R., et al. (2016). Faster r-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.
Article Google Scholar
Ruan, W., Liang, C., Yu, Y., et al. (2020). Correlation discrepancy insight network for video re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 16(4), 1–21.
Article Google Scholar
Shimizu, R., Mukuta, Y., & Harada, T. (2020). Hyperbolic neural networks++. arXiv:2006.08210
Szegedy, C., Vanhoucke, V., Ioffe, S., et al. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).
Tang, Z., Zhang, R., Peng, Z., et al. (2022). Multi-stage spatio-temporal aggregation transformer for video person re-identification. IEEE Transactions on Multimedia, 25, 7917–7929.
Article Google Scholar
Wang, T., Gong, S., Zhu, X., et al. (2014). Person re-identification by video ranking. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13 (pp 688–703). Springer.
Wang, Y., Wang, L., You, Y., et al. (2018b). Resource aware person re-identification across multiple resolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8042–8051).
Wang, G., Yang, S., Liu, H., et al. (2020). High-order information matters: Learning relation and topology for occluded person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6449–6458).
Wang, Y., Zhang, P., Gao, S., et al. (2021). Pyramid spatial-temporal aggregation for video-based person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 12026–12035).
Wang, C., Zhang, Q., Huang, C., et al. (2018a). Mancs: A multi-task attentional network with curriculum sampling for person re-identification. In Proceedings of the European conference on computer vision (ECCV) (pp. 365–381).
Wu, Y., Lin, Y., Dong, X., et al. (2018). Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5177–5186).
Wu, Y., Bourahla, O. E. F., Li, X., et al. (2020). Adaptive graph representation learning for video person re-identification. IEEE Transactions on Image Processing, 29, 8821–8830.
Article Google Scholar
Yan, Y., Qin, J., Chen, J., et al. (2020). Learning multi-granular hypergraphs for video-based person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2899–2908).
Yang, J., Zheng, W.S., Yang, Q., et al. (2020) Spatial-temporal graph convolutional network for video-based person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3289–3299).
Yang, M., Zhou, M., Li, Z., et al. (2022). Hyperbolic graph neural networks: A review of methods and applications. arXiv:2202.13852
Ye, M., Shen, J., Lin, G., et al. (2021). Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6), 2872–2893.
Article Google Scholar
Yin, J., Wu, A., & Zheng, W. S. (2020). Fine-grained person re-identification. International Journal of Computer Vision, 128(6), 1654–1672.
Article Google Scholar
Yu, C., Liu, X., Wang, Y., et al. (2024). Tf-clip: Learning text-free clip for video-based person re-identification. In Proceedings of the AAAI conference on artificial intelligence (pp. 6764–6772).
Zang, X., Li, G., & Gao, W. (2022). Multidirection and multiscale pyramid in transformer for video-based pedestrian retrieval. IEEE Transactions on Industrial Informatics, 18(12), 8776–8785.
Article Google Scholar
Zhang, Z., Lan, C., Zeng, W., et al. (2020). Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10407–10416).
Zhang, T., Wei, L., Xie, L., et al. (2021b). Spatiotemporal transformer for video-based person re-identification. arXiv:2103.16469
Zhang, Z., Zhang, H., & Liu, S. (2021d). Person re-identification using heterogeneous local graph attention networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12136–12145).
Zhang, S., Chen, D., Yang, J., et al. (2021). Guided attention in CNNs for occluded pedestrian detection and re-identification. International Journal of Computer Vision, 129, 1875–1892.
Article Google Scholar
Zhang, L., Fu, X., Huang, F., et al. (2024). An open-world, diverse, cross-spatial-temporal benchmark for dynamic wild person re-identification. International Journal of Computer Vision, 132, 3823–3846.
Article Google Scholar
Zhang, Y., Wang, X., Shi, C., et al. (2021). Hyperbolic graph attention network. IEEE Transactions on Big Data, 8(6), 1690–1701.
Zhang, Z., Zhang, H., Liu, S., et al. (2021). Part-guided graph convolution networks for person re-identification. Pattern Recognition, 120, 108155.
Zheng, L., Bie, Z., Sun, Y., et al. (2016). Mars: A video benchmark for large-scale person re-identification. In Computer Vision—ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14 (pp. 868–884). Springer.
Zhong, Z., Zheng, L., Kang, G., et al. (2020). Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence (pp. 13001–13008).
Zhou, Z., Huang, Y., Wang, W., et al. (2017). See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4747–4756).

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants No. 62472060, 62441601, U23A20318, 62206035, and 62221005, in part by the Natural Science Foundation of Chongqing under Grand No. CSTB2022NSC QMSX1024, CSTB2024NSCQQCXMX0060, CSTB2023NSCQLZX0 061, in part by the Science and Technology Research Program of Chongqing Municipal Education Commission under Grant No. KJZDK 202300604, in part by the Science and Technology Innovation Key R&D Program of Chongqing under Grant No. CSTB2023TIADSTX0016, and in part by the Chongqing Institute for Brain and Intelligence.

Author information

Authors and Affiliations

Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Jiaxu Leng, Changjiang Kuang, Shuang Li, Ji Gan, Haosheng Chen & Xinbo Gao
Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory, Chongqing, 400065, China
Jiaxu Leng, Changjiang Kuang, Shuang Li, Ji Gan, Haosheng Chen & Xinbo Gao

Authors

Jiaxu Leng
View author publications
Search author on:PubMed Google Scholar
Changjiang Kuang
View author publications
Search author on:PubMed Google Scholar
Shuang Li
View author publications
Search author on:PubMed Google Scholar
Ji Gan
View author publications
Search author on:PubMed Google Scholar
Haosheng Chen
View author publications
Search author on:PubMed Google Scholar
Xinbo Gao
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Xinbo Gao.

Additional information

Communicated by Bumsub Ham.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Leng, J., Kuang, C., Li, S. et al. Dual-Space Video Person Re-identification. Int J Comput Vis 133, 3667–3688 (2025). https://doi.org/10.1007/s11263-025-02350-5

Download citation

Received: 12 August 2024
Accepted: 06 January 2025
Published: 27 January 2025
Version of record: 27 January 2025
Issue date: June 2025
DOI: https://doi.org/10.1007/s11263-025-02350-5

Keywords

Profiles

Jiaxu Leng View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual-Space Video Person Re-identification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Discriminative Video Representations Using Adversarial Perturbations

Unsupervised Hyperbolic Action Recognition

Temporal Extension Topology Learning for Video-Based Person Re-identification

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now