Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification

Rao, Haocong; Leung, Cyril; Miao, Chunyan

doi:10.1007/s11263-023-01864-0

Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification

Published: 29 August 2023

Volume 132, pages 238–260, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

1200 Accesses
19 Citations
1 Altmetric
Explore all metrics

Abstract

With rapid advancements in depth sensors and deep learning, skeleton-based person re-identification (re-ID) models have recently achieved remarkable progress with many advantages. Most existing solutions learn single-level skeleton features from body joints with the assumption of equal skeleton importance, while they typically lack the ability to exploit more informative skeleton features from various levels such as limb level with more global body patterns. The label dependency of these methods also limits their flexibility in learning more general skeleton representations. This paper proposes a generic unsupervised Hierarchical skeleton Meta-Prototype Contrastive learning (Hi-MPC) approach with Hard Skeleton Mining (HSM) for person re-ID with unlabeled 3D skeletons. Firstly, we construct hierarchical representations of skeletons to model coarse-to-fine body and motion features from the levels of body joints, components, and limbs. Then a hierarchical meta-prototype contrastive learning model is proposed to cluster and contrast the most typical skeleton features (“prototypes”) from different-level skeletons. By converting original prototypes into meta-prototypes with multiple homogeneous transformations, we induce the model to learn the inherent consistency of prototypes to capture more effective skeleton features for person re-ID. Furthermore, we devise a hard skeleton mining mechanism to adaptively infer the informative importance of each skeleton, so as to focus on harder skeletons to learn more discriminative skeleton representations. Extensive evaluations on five datasets demonstrate that our approach outperforms a wide variety of state-of-the-art skeleton-based methods. We further show the general applicability of our method to cross-view person re-ID and RGB-based scenarios with estimated skeletons.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Skeleton Prototype Contrastive Learning with Multi-level Graph Relation Modeling for Unsupervised Person Re-Identification

Hierarchically Self-supervised Transformer for Human Skeleton Representation Learning

A Lower Limb Exoskeleton Adaptive Control Method Based on Model-free Reinforcement Learning and Improved Dynamic Movement Primitives

Article Open access 11 February 2025

Availability of data, materials, and codes

All data are available at https://github.com/Kali-Hac/Hi-MPC.

References

Andersson, V. O., & Araujo, R. M. (2015). Person identification using anthropometric and gait data from Kinect sensor. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 425–431.
Baltieri, D., Vezzani, R., & Cucchiara, R. (2011).Sarc3D: a new 3D body model for people tracking and re-identification. In: International conference on image analysis and processing. Springer, pp. 197–206.
Barbosa, I. B., Cristani, M., Del Bue, A., Bazzani, L., & Murino, V. (2012). Re-identification with RGB-D sensors. In: the European Conference on Computer Vision (ECCV) Workshop. Springer, pp. 433–442.
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., & Sheikh, Y. (2019). OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 172–186.
Article Google Scholar
Chen, X., & He, K. (2021). Exploring simple siamese representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 15 750–15 758.
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In: International conference on machine learning (ICML), pp. 1597–1607.
Chen, C.-H., & Ramanan, D. (2017). 3D human pose estimation= 2D pose estimation+ matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 7035–7043.
Chen, Y.-C., Zhu, X., Zheng, W.-S., & Lai, J.-H. (2018). Person re-identification by camera correlation aware feature augmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(2), 392–408.
Article Google Scholar
Davis, J. V. , Kulis, B., Jain, P., Sra, S., & Dhillon, I. S. (2007). Information-theoretic metric learning. In: International conference on machine learning (ICML), pp. 209–216.
Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 96(34), 226–231.
Google Scholar
Farenzena, M., Bazzani, L., Perina, A., Murino, V., & Cristani, M. (2010). Person re-identification by symmetry-driven accumulation of local features. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp. 2360–2367.
Feng, S., Miao, C., Xu, K., Wu, J., Wu, P., Zhang, Y., & Zhao, P. (2022). Multi-scale attention flow for probabilistic time series forecasting. arXiv preprint arXiv:2205.07493
Feng, S., Xu, C., Zuo, Y., Chen, G., Lin, F., & XiaHou, J. (2022). Relation-aware dynamic attributed graph attention network for stocks recommendation. Pattern Recognition, 121, 108119.
Article Google Scholar
Ge, W. (2018). Deep metric learning with hierarchical triplet loss. In: Proceedings of the European conference on computer vision (ECCV), pp. 269–285.
Ge, Y., Zhu, F., Chen, D., Zhao, R., & Li, H. (2020). Self-paced contrastive learning with hybrid memory for domain adaptive object Re-ID. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 11309–11321.
Gray, D., & Tao, H. (2008). Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp. 262–275.
Gutmann, M., & Hyvärinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In: International conference on artificial intelligence and statistics, pp. 297–304.
Han, F., Reily, B., Hoff, W., & Zhang, H. (2017). Space-time representation of people based on 3D skeletal data: A review. Computer Vision and Image Understanding, 158, 85–105.
Article Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 9729–9738.
Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737
Hu, Q., Wang, X., Hu, W., & Qi, G.-J. (2021). Adco: Adversarial contrast for efficient learning of unsupervised representations from self-trained negative adversaries. In: Proceedings of the ieee conference on computer vision and pattern recognition (CVPR), pp. 1074–1083.
Jeon, S., Min, D., Kim, S., & Sohn, K. (2021). Mining better samples for contrastive learning of temporal correspondence. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 1034–1044.
Kalantidis, Y., Sariyildiz, M. B., Pion, N., Weinzaepfel, P., & Larlus, D. (2020). Hard negative mixing for contrastive learning. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 21 798–21 809
Karianakis, N., Liu, Z., Chen, Y., & Soatto, S. (2018). Reinforced temporal attention and split-rate transfer for depth-based person re-identification. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp. 715–733.
Lan, L., Wang, X., Hua, G., Huang, T. S., & Tao, D. (2020). Semi-online multi-people tracking by re-identification. International Journal of Computer Vision, 128(7), 1937–1955.
Article MathSciNet Google Scholar
Li, J., Zhou, P., Xiong, C., & Hoi, S. (2021). Prototypical contrastive learning of unsupervised representations. In: International conference on learning representation (ICLR)
Liao, R., Yu, S., An, W., & Huang, Y. (2020). A model-based gait recognition method with body pose and human prior knowledge. Pattern Recognition, 98, 107069.
Article Google Scholar
Li, J., Ma, A. J., & Yuen, P. C. (2018). Semi-supervised region metric learning for person re-identification. International Journal of Computer Vision, 126(8), 855–874.
Article Google Scholar
Liu, J., Ni, B., Yan, Y., Zhou, P., Cheng, S., & Hu, J. (2018). Pose transferrable person re-identification. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp. 4099–4108.
Liu, Z., Zhang, Z., Wu, Q., & Wang, Y. (2015). Enhancing person re-identification by integrating gait biometric. Neurocomputing, 168, 1144–1156.
Article Google Scholar
Li, M., Zhu, X., & Gong, S. (2019). Unsupervised tracklet person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(7), 1770–1782.
Article Google Scholar
Lu, J., Wan, H., Li, P., Zhao, X., Ma, N., & Gao, Y. (2023). Exploring high-order spatio-temporal correlations from skeleton for person re-identification. In: IEEE Transactions on Image Processing
Munaro, M., Basso, A., Fossati, A., Van Gool, L., & Menegatti, E. (2014a). 3D reconstruction of freely moving persons for re-identification with a depth sensor. In: International conference on robotics and automation (ICRA). IEEE, pp. 4512–4519.
Munaro, M., Fossati, A., Basso, A., Menegatti, E., & Van Gool, L. (2014b). One-shot person re-identification with a consumer depth camera. In: Person Re-Identification. Springer, pp. 161–181.
Munaro, M., Ghidoni, S., Dizmen, D. T., & Menegatti, E. (2014). A feature-based approach to people re-identification using skeleton keypoints. In: International conference on robotics and automation (ICRA). IEEE, pp. 5644–5651.
Murray, M. P., Drought, A. B., & Kory, R. C. (1964). Walking patterns of normal men. Journal of Bone and Joint Surgery, 46(2), 335–360.
Article Google Scholar
Nambiar, A., Bernardino, A., Nascimento, J. C., & Fred, A. (2017). Context-aware person re-identification in the wild via fusion of gait and anthropometric features. In: International conference on automatic face and gesture recognition. IEEE, pp. 973–980.
Nambiar, A., Bernardino, A., & Nascimento, J. C. (2019). Gait-based person re-identification: A survey. ACM Computing Surveys, 52(2), 33.
Google Scholar
Pala, P., Seidenari, L., Berretti, S., & Del Bimbo, A. (2019). Enhanced skeleton and face 3D data for person re-identification from depth cameras. Computers and Graphics, 79, 69–80.
Article Google Scholar
Qian, X., Fu, Y., Xiang, T., Jiang, Y., & Xue, X. (2019). Leader-based multi-scale attention deep architecture for person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 371–385.
Article Google Scholar
Rao, H., & Miao, C. (2022a). “SimMC: Simple masked contrastive learning of skeleton representations for unsupervised person re-identification. In: International joint conference on artificial intelligence (IJCAI), pp. 1290–1297.
Rao,H., & Miao, C. (2022b). Skeleton prototype contrastive learning with multi-level graph relation modeling for unsupervised person re-identification. arXiv preprint arXiv:2208.11814
Rao, H., & Miao, C. (2023). TranSG: Transformer-based skeleton graph prototype contrastive learning with structure-trajectory prompted reconstruction for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2023, pp. 22118–22128.
Rao, H., Hu, X., Cheng, J., & Hu, B. (2021c). SM-SGE: A self-supervised multi-scale skeleton graph encoding framework for person re-identification. In: Proceedings of the 29th ACM international conference on multimedia, pp. 1812–1820.
Rao, H., Wang, S., Hu, X., Tan, M., Da, H., Cheng, J., & Hu, B. (2020). Self-supervised gait encoding with locality-aware attention for person re-identification. In: International Joint Conference on Artificial Intelligence (IJCAI), vol. 1, pp. 898–905.
Rao, H., Xu, S., Hu, X., Cheng, J., & Hu, B. (2021b). Multi-level graph encoding with structural-collaborative relation learning for skeleton-based person re-identification. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 973–980.
Rao, H., Li, Y., & Miao, C. (2022). Revisiting k-reciprocal distance re-ranking for skeleton-based person re-identification. IEEE Signal Processing Letters, 29, 2103–2107.
Article Google Scholar
Rao, H., Wang, S., Hu, X., Tan, M., Guo, Y., Cheng, J., Liu, X., & Hu, B. (2021a). A self-supervised gait encoding approach with locality-awareness for 3D skeleton based person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 01, 1–1.
Google Scholar
Robinson, J., Chuang, C.-Y., Sra, S., & Jegelka, S. (2021). Contrastive learning with hard negative samples. In: International conference on learning representations (ICLR)
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823.
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M. J., Moore, R., Kipman, A. A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 1297–1304.
Su, C., Li, J., Zhang, S., Xing, J., Gao, W., & Tian, Q. (2017). Pose-driven deep convolutional model for person re-identification. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 3960–3969.
Sun, B., Feng, J., & Saenko, K. (2016). Return of frustratingly easy domain adaptation. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 30(1), 2058–2065.
Google Scholar
Su, C., Yang, F., Zhang, S., Tian, Q., Davis, L. S., & Gao, W. (2018). Multi-task learning with low rank attribute embedding for multi-camera person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(5), 1167–1181.
Article Google Scholar
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11), 2579–2605.
Google Scholar
van den Oord, A., Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 30
Verma, V., Luong, T., Kawaguchi, K., Pham, H., & Le, Q. (2021). Towards domain-agnostic contrastive learning. In: International conference on machine learning (ICML). PMLR, pp. 10530–10541.
Vezzani, R., Baltieri, D., & Cucchiara, R. (2013). People reidentification in surveillance and forensics: A survey. ACM Computing Surveys, 46(2), 29.
Article Google Scholar
Wang, W., Zhou, W., Bao, J., Chen, D., & Li, H. (2021). Instance-wise hard negative example generation for contrastive learning in unpaired image-to-image translation. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 14020–14029.
Wang, T., Gong, S., Zhu, X., & Wang, S. (2016). Person re-identification by discriminative selection in video ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(12), 2501–2514.
Article Google Scholar
Wang, T., Liu, H., Song, P., Guo, T., & Shi, W. (2022). Pose-guided feature disentangling for occluded person re-identification based on transformer. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 36(3), 2540–2549.
Article Google Scholar
Wang, L., Tan, T., Ning, H., & Hu, W. (2003). Silhouette analysis-based gait recognition for human identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12), 1505–1518.
Article Google Scholar
Wang, C., Zhang, J., Wang, L., Pu, J., & Yuan, X. (2011). Human identification using temporal information preserving gait template. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2164–2176.
Article Google Scholar
Wei, L., Zhang, S., Yao, H., Gao, W., & Tian, Q. (2017). GLAD: Global-local-alignment descriptor for pedestrian retrieval. In: Proceedings of the 25th ACM international conference on Multimedia, pp. 420–428.
Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(2), 207–244.
Google Scholar
Winter, D. A. (2009). Biomechanics and motor control of human movement. John Wiley & Sons
Wu, Z., Xiong, Y., Yu, S. X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3733–3742.
Wu, A., Zheng, W.-S., Gong, S., & Lai, J. (2020). RGB-IR person re-identification by cross-modality similarity preservation. International Journal of Computer Vision, 128(6), 1765–1785.
Article MathSciNet Google Scholar
Xiao, T., Liu, S., De Mello, S., Yu, Z., Kautz, J., & Yang, M.-H. (2022). Learning contrastive representation for semantic correspondence. International Journal of Computer Vision, 130(5), 1293–1309.
Article Google Scholar
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., & Hoi, S. C. (2021). Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6), 2872–2893.
Article Google Scholar
Yoo, J.-H., Nixon, M. S., & Harris, C. J. (2002). Extracting gait signatures based on anatomical knowledge. In: Proceedings of BMVA symposium on advancing biometric technologies. Citeseer, pp. 596–606.
Yu, S., Tan, D., & Tan, T. (2006). A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In: International Conference on Pattern Recognition (ICPR), vol. 4. IEEE, pp. 441–444.
Yu, H.-X., Wu, A., & Zheng, W.-S. (2020). Unsupervised person re-identification by deep asymmetric metric embedding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 956–973.
Article Google Scholar
Zhang, Z., Lan, C., Zeng, W., & Chen, Z. (2019). Densely semantically aligned person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 667–676.
Zhang, S., Liu, M., Yan, J., Zhang, H., Huang, L., Yang, X., & Lu, P. (2022). M-mix: Generating hard negatives via multi-sample mixing for contrastive learning. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pp. 2461–2470.
Zhao, R., Oyang, W., & Wang, X. (2017). Person re-identification by saliency learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(2), 356–370.
Article Google Scholar
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 1116–1124.
Zheng, W.-S., Gong, S., & Xiang, T. (2015). Towards open-world person re-identification by one-shot group-based verification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3), 591–606.
Article Google Scholar
Zhou, J. T., Pan, S. J., & Tsang, I. W. (2019). A deep learning framework for hybrid heterogeneous transfer learning. Artificial Intelligence, 275, 310–328.
Article MathSciNet Google Scholar

Download references

Funding

This research is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG2-PhD/2022-01-034[T]).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Haocong Rao & Chunyan Miao
Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly (LILY), Nanyang Technological University, Singapore, Singapore
Haocong Rao, Cyril Leung & Chunyan Miao
Department of Electrical and Computer Engineering, The University of British Columbia, Vancouver, Canada
Cyril Leung

Authors

Haocong Rao
View author publications
Search author on:PubMed Google Scholar
Cyril Leung
View author publications
Search author on:PubMed Google Scholar
Chunyan Miao
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Chunyan Miao.

Ethics declarations

Conflict of interest

The authors declare they have no conflict of interest.

Ethical Statements

The datasets used in our work are officially shared by reliable research agencies, which guarantee that the collecting, processing, releasing, and using of data have gained the formal consent of participants. To protect privacy, all individuals are anonymized with simple identity numbers. Our models and codes must only be used for legitimate research.

Additional information

Communicated by Yasushi Yagi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rao, H., Leung, C. & Miao, C. Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification. Int J Comput Vis 132, 238–260 (2024). https://doi.org/10.1007/s11263-023-01864-0

Download citation

Received: 22 August 2022
Accepted: 24 July 2023
Published: 29 August 2023
Version of record: 29 August 2023
Issue date: January 2024
DOI: https://doi.org/10.1007/s11263-023-01864-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Skeleton Prototype Contrastive Learning with Multi-level Graph Relation Modeling for Unsupervised Person Re-Identification

Hierarchically Self-supervised Transformer for Human Skeleton Representation Learning

A Lower Limb Exoskeleton Adaptive Control Method Based on Model-free Reinforcement Learning and Improved Dynamic Movement Primitives

Explore related subjects

Availability of data, materials, and codes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Statements

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now