这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Dual-Space Video Person Re-identification

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Video person re-identification (VReID) aims to recognize individuals across video sequences. Existing methods primarily use Euclidean space for representation learning but struggle to capture complex hierarchical structures, especially in scenarios with occlusions and background clutter. In contrast, hyperbolic space, with its negatively curved geometry, excels at preserving hierarchical relationships and enhancing discrimination between similar appearances. Inspired by these, we propose Dual-Space Video Person Re-Identification (DS-VReID) to utilize the strength of both Euclidean and hyperbolic geometries, capturing the visual features while also exploring the intrinsic hierarchical relations, thereby enhancing the discriminative capacity of the features. Specifically, we design the Dynamic Prompt Graph Construction (DPGC) module, which uses a pre-trained CLIP model with learnable dynamic prompts to construct 3D graphs that capture subtle changes and dynamic information in video sequences. Building upon this, we introduce the Hyperbolic Disentangled Aggregation (HDA) module, which addresses long-range dependency modeling by decoupling node distances and integrating adjacency matrices, capturing detailed spatial-temporal hierarchical relationships. Extensive experiments on benchmark datasets demonstrate the superiority of DS-VReID over state-of-the-art methods, showcasing its potential in complex VReID scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

These datasets for this study can be found in the following: MARS: http://zheng-lab.cecs.anu.edu.au/Project/project_mars.html iLIDS-VID: https://xiatian-zhu.github.io/downloads_qmul_iLIDS-VID_ReID_dataset.html PRID2011: https://www.tugraz.at/institute/icg/research/team-bischof/lrs/ downloads/PRID11/ DukeMTMC-VideoReID: http://vision.cs.duke.edu/DukeMTMC LS-VID: The dataset generated during and/or analysed during the current study are not publicly available due to [LS-VID RELEASE AGREEMENT].

References

  • Bachmann, G., Bécigneul, G., & Ganea, O. (2020). Constant curvature graph convolutional networks. In International conference on machine learning, PMLR (pp. 486–496).

  • Bai, S., Ma, B., Chang, H., et al. (2022). Salient-to-broad transition for video person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7339–7348).

  • Chami, I., Ying, Z., Ré, C., et al. (2019). Hyperbolic graph convolutional neural networks. In Advances in neural information processing systems, Vol. 32.

  • Chen, Z., Zhou, Z., Huang, J., et al. (2020). Frame-guided region-aligned representation for video person re-identification. In Proceedings of the AAAI conference on artificial intelligence (pp. 10591–10598).

  • Chen, S., Da, H., Wang, D. H., et al. (2023). Hasi: Hierarchical attention-aware Spatio-temporal interaction for video-based person re-identification. IEEE Transactions on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2023.3340428

    Article  Google Scholar 

  • Eom, C., Lee, G., Lee, J., et al. (2021). Video-based person re-identification with spatial and temporal memory networks. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 12036–12045).

  • Ermolov, A., Mirvakhabova, L., Khrulkov, V., et al. (2022). Hyperbolic vision transformers: Combining improvements in metric learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7409–7419).

  • Fu, Y., Wang, X., Wei, Y., et al. (2019). Sta: Spatial-temporal attention for large-scale video-based person re-identification. In Proceedings of the AAAI conference on artificial intelligence (pp. pp 8287–8294).

  • Ganea, O., Bécigneul, G., & Hofmann, T. (2018). Hyperbolic neural networks. In Advances in neural information processing systems, Vol. 31.

  • Gu, X., Chang, H., Ma, B., et al. (2020). Appearance-preserving 3d convolution for video-based person re-identification. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pp. 228–243. Springer.

  • Gu, X., Chang, H., Ma, B., et al. (2022). Motion feature aggregation for video-based person re-identification. IEEE Transactions on Image Processing, 31, 3908–3919.

    Article  Google Scholar 

  • He, T., Jin, X., Shen, X., et al. (2021). Dense interaction learning for video-based person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1490–1501).

  • He, K., Zhang, X., Ren, S., et al. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

  • He, S., Chen, W., Wang, K., et al. (2023). Region generation and assessment network for occluded person re-identification. IEEE Transactions on Information Forensics and Security. https://doi.org/10.1109/TIFS.2023.3318956

    Article  Google Scholar 

  • Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv:1703.07737

  • Hirzer, M,. Beleznai, C., Roth, P.M., et al. (2011). Person re-identification by descriptive and discriminative classification. In Image analysis: 17th scandinavian conference, SCIA 2011, Ystad, Sweden, May 2011. Proceedings 17 (pp. 91–102). Springer.

  • Hou, R., Chang, H., Ma, B., et al. (2020). Temporal complementary learning for video person re-identification. In Computer Vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16 (pp. 388–405). Springer.

  • Hou, R., Chang, H., Ma, B., et al. (2021) Bicnet-tks: Learning efficient spatial-temporal representation for video person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2014–2023).

  • Hou, R., Ma, B., Chang, H., et al. (2019). Vrstc: Occlusion-free video person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7183–7192).

  • Huang, Y., Fu, X., Li, L., et al. (2022). Learning degradation-invariant representation for robust real-world person re-identification. International Journal of Computer Vision, 130(11), 2770–2796.

    Article  Google Scholar 

  • Kingma, D.P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980

  • Kipf, T.N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv:1609.02907

  • Law, M., Liao, R., Snell, J., et al. (2019). Lorentzian distance learning for hyperbolic representations. In International conference on machine learning, PMLR (pp. 3672–3681).

  • Leng, J., Wang, H., Gao, X., et al. (2023). Where to look: Multi-granularity occlusion aware for video person re-identification. Neurocomputing, 536, 137–151.

    Article  Google Scholar 

  • Li, S., Bak, S., Carr, P., et al. (2018). Diversity regularized spatiotemporal attention for video-based person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 369–378).

  • Li, S., Sun, L., Li, Q. (2023). Clip-reid: Exploiting vision-language model for image re-identification without concrete text labels. In Proceedings of the AAAI conference on artificial intelligence (pp. 1405–1413).

  • Li, J., Wang, J., Tian, Q., et al. (2019a). Global-local temporal representations for video person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3958–3967).

  • Li, S., Yu, H., & Hu, H. (2020b). Appearance and motion enhancement for video-based person re-identification. In Proceedings of the AAAI conference on artificial intelligence (pp. 11394–11401).

  • Li, J., Zhang, S., & Huang, T. (2019b). Multi-scale 3d convolution network for video based person re-identification. In Proceedings of the AAAI conference on artificial intelligence (pp. 8618–8625).

  • Liu, Q., Nickel, M., & Kiela, D. (2019a). Hyperbolic graph neural networks. In Advances in neural information processing systems, Vol. 32.

  • Liu, Y., Yuan, Z., Zhou, W., et al. (2019b). Spatial and temporal mutual promotion for video-based person re-identification. In Proceedings of the AAAI conference on artificial intelligence (pp. 8786–8793).

  • Liu, J., Zha, Z.J., Wu, W., et al. (2021a). Spatial-temporal correlation and topology learning for person re-identification in videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4370–4379).

  • Liu, X., Zhang, P., Yu, C., et al. (2021b). Watching you: Global-guided reciprocal learning for video-based person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13334–13343).

  • Liu, L., Yang, X., Wang, N., et al. (2023). Frequency information disentanglement network for video-based person re-identification. IEEE Transactions on Image Processing. https://doi.org/10.1109/TIP.2023.3296901

    Article  Google Scholar 

  • Liu, X., Yu, C., Zhang, P., et al. (2023). Deeply coupled convolution-transformer with spatial-temporal complementary learning for video-based person re-identification. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2023.3271353

    Article  Google Scholar 

  • Liu, X., Zhang, P., Yu, C., et al. (2024). A video is worth three views: Trigeminal transformers for video-based person re-identification. IEEE Transactions on Intelligent Transportation Systems. https://doi.org/10.1109/TITS.2024.3386914

    Article  Google Scholar 

  • Li, P., Xu, Y., Wei, Y., et al. (2020). Self-correction for human parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6), 3260–3271.

    Article  Google Scholar 

  • Long, T., Mettes, P., Shen, H.T., et al. (2020). Searching for actions on the hyperbole. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1141–1150).

  • Nickel, M., & Kiela, D. (2017). Poincaré embeddings for learning hierarchical representations. In Advances in neural information processing systems, Vol. 30.

  • Pan, H., Bai, Y., He, Z., et al. (2022). Aagcn: Adjacency-aware graph convolutional network for person re-identification. Knowledge-Based Systems, 236, 107300.

    Article  Google Scholar 

  • Pan, H., Liu, Q., Chen, Y., et al. (2023). Pose-aided video-based person re-identification via recurrent graph convolutional network. IEEE Transactions on Circuits and Systems for Video Technology, 33(12), 7183–7196.

    Article  Google Scholar 

  • Park, J., Cho, J., Chang, H.J., et al. (2021). Unsupervised hyperbolic representation learning via message passing auto-encoders. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5516–5526).

  • Peng, W., Shi, J., Xia, Z., et al. (2020). Mix dimension in Poincaré geometry for 3d skeleton-based action recognition. In Proceedings of the 28th ACM international conference on multimedia (pp. 1432–1440).

  • Peng, W., Varanka, T., Mostafa, A., et al. (2021). Hyperbolic deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12), 10023–10044.

    Article  Google Scholar 

  • Radford, A., Kim, J.W., Hallacy, C., et al. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning, PMLR (pp. 8748–8763).

  • Ren, S., He, K., Girshick, R., et al. (2016). Faster r-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.

    Article  Google Scholar 

  • Ruan, W., Liang, C., Yu, Y., et al. (2020). Correlation discrepancy insight network for video re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 16(4), 1–21.

    Article  Google Scholar 

  • Shimizu, R., Mukuta, Y., & Harada, T. (2020). Hyperbolic neural networks++. arXiv:2006.08210

  • Szegedy, C., Vanhoucke, V., Ioffe, S., et al. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).

  • Tang, Z., Zhang, R., Peng, Z., et al. (2022). Multi-stage spatio-temporal aggregation transformer for video person re-identification. IEEE Transactions on Multimedia, 25, 7917–7929.

    Article  Google Scholar 

  • Wang, T., Gong, S., Zhu, X., et al. (2014). Person re-identification by video ranking. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13 (pp 688–703). Springer.

  • Wang, Y., Wang, L., You, Y., et al. (2018b). Resource aware person re-identification across multiple resolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8042–8051).

  • Wang, G., Yang, S., Liu, H., et al. (2020). High-order information matters: Learning relation and topology for occluded person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6449–6458).

  • Wang, Y., Zhang, P., Gao, S., et al. (2021). Pyramid spatial-temporal aggregation for video-based person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 12026–12035).

  • Wang, C., Zhang, Q., Huang, C., et al. (2018a). Mancs: A multi-task attentional network with curriculum sampling for person re-identification. In Proceedings of the European conference on computer vision (ECCV) (pp. 365–381).

  • Wu, Y., Lin, Y., Dong, X., et al. (2018). Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5177–5186).

  • Wu, Y., Bourahla, O. E. F., Li, X., et al. (2020). Adaptive graph representation learning for video person re-identification. IEEE Transactions on Image Processing, 29, 8821–8830.

    Article  Google Scholar 

  • Yan, Y., Qin, J., Chen, J., et al. (2020). Learning multi-granular hypergraphs for video-based person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2899–2908).

  • Yang, J., Zheng, W.S., Yang, Q., et al. (2020) Spatial-temporal graph convolutional network for video-based person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3289–3299).

  • Yang, M., Zhou, M., Li, Z., et al. (2022). Hyperbolic graph neural networks: A review of methods and applications. arXiv:2202.13852

  • Ye, M., Shen, J., Lin, G., et al. (2021). Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6), 2872–2893.

    Article  Google Scholar 

  • Yin, J., Wu, A., & Zheng, W. S. (2020). Fine-grained person re-identification. International Journal of Computer Vision, 128(6), 1654–1672.

    Article  Google Scholar 

  • Yu, C., Liu, X., Wang, Y., et al. (2024). Tf-clip: Learning text-free clip for video-based person re-identification. In Proceedings of the AAAI conference on artificial intelligence (pp. 6764–6772).

  • Zang, X., Li, G., & Gao, W. (2022). Multidirection and multiscale pyramid in transformer for video-based pedestrian retrieval. IEEE Transactions on Industrial Informatics, 18(12), 8776–8785.

    Article  Google Scholar 

  • Zhang, Z., Lan, C., Zeng, W., et al. (2020). Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10407–10416).

  • Zhang, T., Wei, L., Xie, L., et al. (2021b). Spatiotemporal transformer for video-based person re-identification. arXiv:2103.16469

  • Zhang, Z., Zhang, H., & Liu, S. (2021d). Person re-identification using heterogeneous local graph attention networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12136–12145).

  • Zhang, S., Chen, D., Yang, J., et al. (2021). Guided attention in CNNs for occluded pedestrian detection and re-identification. International Journal of Computer Vision, 129, 1875–1892.

    Article  Google Scholar 

  • Zhang, L., Fu, X., Huang, F., et al. (2024). An open-world, diverse, cross-spatial-temporal benchmark for dynamic wild person re-identification. International Journal of Computer Vision, 132, 3823–3846.

    Article  Google Scholar 

  • Zhang, Y., Wang, X., Shi, C., et al. (2021). Hyperbolic graph attention network. IEEE Transactions on Big Data, 8(6), 1690–1701.

  • Zhang, Z., Zhang, H., Liu, S., et al. (2021). Part-guided graph convolution networks for person re-identification. Pattern Recognition, 120, 108155.

  • Zheng, L., Bie, Z., Sun, Y., et al. (2016). Mars: A video benchmark for large-scale person re-identification. In Computer Vision—ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14 (pp. 868–884). Springer.

  • Zhong, Z., Zheng, L., Kang, G., et al. (2020). Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence (pp. 13001–13008).

  • Zhou, Z., Huang, Y., Wang, W., et al. (2017). See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4747–4756).

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants No. 62472060, 62441601, U23A20318, 62206035, and 62221005, in part by the Natural Science Foundation of Chongqing under Grand No. CSTB2022NSC QMSX1024, CSTB2024NSCQQCXMX0060, CSTB2023NSCQLZX0 061, in part by the Science and Technology Research Program of Chongqing Municipal Education Commission under Grant No. KJZDK 202300604, in part by the Science and Technology Innovation Key R&D Program of Chongqing under Grant No. CSTB2023TIADSTX0016, and in part by the Chongqing Institute for Brain and Intelligence.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinbo Gao.

Additional information

Communicated by Bumsub Ham.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Leng, J., Kuang, C., Li, S. et al. Dual-Space Video Person Re-identification. Int J Comput Vis 133, 3667–3688 (2025). https://doi.org/10.1007/s11263-025-02350-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-025-02350-5

Keywords

Profiles

  1. Jiaxu Leng