Abstract
With rapid advancements in depth sensors and deep learning, skeleton-based person re-identification (re-ID) models have recently achieved remarkable progress with many advantages. Most existing solutions learn single-level skeleton features from body joints with the assumption of equal skeleton importance, while they typically lack the ability to exploit more informative skeleton features from various levels such as limb level with more global body patterns. The label dependency of these methods also limits their flexibility in learning more general skeleton representations. This paper proposes a generic unsupervised Hierarchical skeleton Meta-Prototype Contrastive learning (Hi-MPC) approach with Hard Skeleton Mining (HSM) for person re-ID with unlabeled 3D skeletons. Firstly, we construct hierarchical representations of skeletons to model coarse-to-fine body and motion features from the levels of body joints, components, and limbs. Then a hierarchical meta-prototype contrastive learning model is proposed to cluster and contrast the most typical skeleton features (“prototypes”) from different-level skeletons. By converting original prototypes into meta-prototypes with multiple homogeneous transformations, we induce the model to learn the inherent consistency of prototypes to capture more effective skeleton features for person re-ID. Furthermore, we devise a hard skeleton mining mechanism to adaptively infer the informative importance of each skeleton, so as to focus on harder skeletons to learn more discriminative skeleton representations. Extensive evaluations on five datasets demonstrate that our approach outperforms a wide variety of state-of-the-art skeleton-based methods. We further show the general applicability of our method to cross-view person re-ID and RGB-based scenarios with estimated skeletons.
Similar content being viewed by others
Availability of data, materials, and codes
All data are available at https://github.com/Kali-Hac/Hi-MPC.
References
Andersson, V. O., & Araujo, R. M. (2015). Person identification using anthropometric and gait data from Kinect sensor. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp. 425–431.
Baltieri, D., Vezzani, R., & Cucchiara, R. (2011).Sarc3D: a new 3D body model for people tracking and re-identification. In: International conference on image analysis and processing. Springer, pp. 197–206.
Barbosa, I. B., Cristani, M., Del Bue, A., Bazzani, L., & Murino, V. (2012). Re-identification with RGB-D sensors. In: the European Conference on Computer Vision (ECCV) Workshop. Springer, pp. 433–442.
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., & Sheikh, Y. (2019). OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 172–186.
Chen, X., & He, K. (2021). Exploring simple siamese representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 15 750–15 758.
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In: International conference on machine learning (ICML), pp. 1597–1607.
Chen, C.-H., & Ramanan, D. (2017). 3D human pose estimation= 2D pose estimation+ matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 7035–7043.
Chen, Y.-C., Zhu, X., Zheng, W.-S., & Lai, J.-H. (2018). Person re-identification by camera correlation aware feature augmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(2), 392–408.
Davis, J. V. , Kulis, B., Jain, P., Sra, S., & Dhillon, I. S. (2007). Information-theoretic metric learning. In: International conference on machine learning (ICML), pp. 209–216.
Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 96(34), 226–231.
Farenzena, M., Bazzani, L., Perina, A., Murino, V., & Cristani, M. (2010). Person re-identification by symmetry-driven accumulation of local features. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp. 2360–2367.
Feng, S., Miao, C., Xu, K., Wu, J., Wu, P., Zhang, Y., & Zhao, P. (2022). Multi-scale attention flow for probabilistic time series forecasting. arXiv preprint arXiv:2205.07493
Feng, S., Xu, C., Zuo, Y., Chen, G., Lin, F., & XiaHou, J. (2022). Relation-aware dynamic attributed graph attention network for stocks recommendation. Pattern Recognition, 121, 108119.
Ge, W. (2018). Deep metric learning with hierarchical triplet loss. In: Proceedings of the European conference on computer vision (ECCV), pp. 269–285.
Ge, Y., Zhu, F., Chen, D., Zhao, R., & Li, H. (2020). Self-paced contrastive learning with hybrid memory for domain adaptive object Re-ID. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 11309–11321.
Gray, D., & Tao, H. (2008). Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp. 262–275.
Gutmann, M., & Hyvärinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In: International conference on artificial intelligence and statistics, pp. 297–304.
Han, F., Reily, B., Hoff, W., & Zhang, H. (2017). Space-time representation of people based on 3D skeletal data: A review. Computer Vision and Image Understanding, 158, 85–105.
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 9729–9738.
Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737
Hu, Q., Wang, X., Hu, W., & Qi, G.-J. (2021). Adco: Adversarial contrast for efficient learning of unsupervised representations from self-trained negative adversaries. In: Proceedings of the ieee conference on computer vision and pattern recognition (CVPR), pp. 1074–1083.
Jeon, S., Min, D., Kim, S., & Sohn, K. (2021). Mining better samples for contrastive learning of temporal correspondence. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 1034–1044.
Kalantidis, Y., Sariyildiz, M. B., Pion, N., Weinzaepfel, P., & Larlus, D. (2020). Hard negative mixing for contrastive learning. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 21 798–21 809
Karianakis, N., Liu, Z., Chen, Y., & Soatto, S. (2018). Reinforced temporal attention and split-rate transfer for depth-based person re-identification. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp. 715–733.
Lan, L., Wang, X., Hua, G., Huang, T. S., & Tao, D. (2020). Semi-online multi-people tracking by re-identification. International Journal of Computer Vision, 128(7), 1937–1955.
Li, J., Zhou, P., Xiong, C., & Hoi, S. (2021). Prototypical contrastive learning of unsupervised representations. In: International conference on learning representation (ICLR)
Liao, R., Yu, S., An, W., & Huang, Y. (2020). A model-based gait recognition method with body pose and human prior knowledge. Pattern Recognition, 98, 107069.
Li, J., Ma, A. J., & Yuen, P. C. (2018). Semi-supervised region metric learning for person re-identification. International Journal of Computer Vision, 126(8), 855–874.
Liu, J., Ni, B., Yan, Y., Zhou, P., Cheng, S., & Hu, J. (2018). Pose transferrable person re-identification. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp. 4099–4108.
Liu, Z., Zhang, Z., Wu, Q., & Wang, Y. (2015). Enhancing person re-identification by integrating gait biometric. Neurocomputing, 168, 1144–1156.
Li, M., Zhu, X., & Gong, S. (2019). Unsupervised tracklet person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(7), 1770–1782.
Lu, J., Wan, H., Li, P., Zhao, X., Ma, N., & Gao, Y. (2023). Exploring high-order spatio-temporal correlations from skeleton for person re-identification. In: IEEE Transactions on Image Processing
Munaro, M., Basso, A., Fossati, A., Van Gool, L., & Menegatti, E. (2014a). 3D reconstruction of freely moving persons for re-identification with a depth sensor. In: International conference on robotics and automation (ICRA). IEEE, pp. 4512–4519.
Munaro, M., Fossati, A., Basso, A., Menegatti, E., & Van Gool, L. (2014b). One-shot person re-identification with a consumer depth camera. In: Person Re-Identification. Springer, pp. 161–181.
Munaro, M., Ghidoni, S., Dizmen, D. T., & Menegatti, E. (2014). A feature-based approach to people re-identification using skeleton keypoints. In: International conference on robotics and automation (ICRA). IEEE, pp. 5644–5651.
Murray, M. P., Drought, A. B., & Kory, R. C. (1964). Walking patterns of normal men. Journal of Bone and Joint Surgery, 46(2), 335–360.
Nambiar, A., Bernardino, A., Nascimento, J. C., & Fred, A. (2017). Context-aware person re-identification in the wild via fusion of gait and anthropometric features. In: International conference on automatic face and gesture recognition. IEEE, pp. 973–980.
Nambiar, A., Bernardino, A., & Nascimento, J. C. (2019). Gait-based person re-identification: A survey. ACM Computing Surveys, 52(2), 33.
Pala, P., Seidenari, L., Berretti, S., & Del Bimbo, A. (2019). Enhanced skeleton and face 3D data for person re-identification from depth cameras. Computers and Graphics, 79, 69–80.
Qian, X., Fu, Y., Xiang, T., Jiang, Y., & Xue, X. (2019). Leader-based multi-scale attention deep architecture for person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 371–385.
Rao, H., & Miao, C. (2022a). “SimMC: Simple masked contrastive learning of skeleton representations for unsupervised person re-identification. In: International joint conference on artificial intelligence (IJCAI), pp. 1290–1297.
Rao,H., & Miao, C. (2022b). Skeleton prototype contrastive learning with multi-level graph relation modeling for unsupervised person re-identification. arXiv preprint arXiv:2208.11814
Rao, H., & Miao, C. (2023). TranSG: Transformer-based skeleton graph prototype contrastive learning with structure-trajectory prompted reconstruction for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2023, pp. 22118–22128.
Rao, H., Hu, X., Cheng, J., & Hu, B. (2021c). SM-SGE: A self-supervised multi-scale skeleton graph encoding framework for person re-identification. In: Proceedings of the 29th ACM international conference on multimedia, pp. 1812–1820.
Rao, H., Wang, S., Hu, X., Tan, M., Da, H., Cheng, J., & Hu, B. (2020). Self-supervised gait encoding with locality-aware attention for person re-identification. In: International Joint Conference on Artificial Intelligence (IJCAI), vol. 1, pp. 898–905.
Rao, H., Xu, S., Hu, X., Cheng, J., & Hu, B. (2021b). Multi-level graph encoding with structural-collaborative relation learning for skeleton-based person re-identification. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 973–980.
Rao, H., Li, Y., & Miao, C. (2022). Revisiting k-reciprocal distance re-ranking for skeleton-based person re-identification. IEEE Signal Processing Letters, 29, 2103–2107.
Rao, H., Wang, S., Hu, X., Tan, M., Guo, Y., Cheng, J., Liu, X., & Hu, B. (2021a). A self-supervised gait encoding approach with locality-awareness for 3D skeleton based person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 01, 1–1.
Robinson, J., Chuang, C.-Y., Sra, S., & Jegelka, S. (2021). Contrastive learning with hard negative samples. In: International conference on learning representations (ICLR)
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823.
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M. J., Moore, R., Kipman, A. A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 1297–1304.
Su, C., Li, J., Zhang, S., Xing, J., Gao, W., & Tian, Q. (2017). Pose-driven deep convolutional model for person re-identification. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 3960–3969.
Sun, B., Feng, J., & Saenko, K. (2016). Return of frustratingly easy domain adaptation. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 30(1), 2058–2065.
Su, C., Yang, F., Zhang, S., Tian, Q., Davis, L. S., & Gao, W. (2018). Multi-task learning with low rank attribute embedding for multi-camera person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(5), 1167–1181.
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11), 2579–2605.
van den Oord, A., Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 30
Verma, V., Luong, T., Kawaguchi, K., Pham, H., & Le, Q. (2021). Towards domain-agnostic contrastive learning. In: International conference on machine learning (ICML). PMLR, pp. 10530–10541.
Vezzani, R., Baltieri, D., & Cucchiara, R. (2013). People reidentification in surveillance and forensics: A survey. ACM Computing Surveys, 46(2), 29.
Wang, W., Zhou, W., Bao, J., Chen, D., & Li, H. (2021). Instance-wise hard negative example generation for contrastive learning in unpaired image-to-image translation. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 14020–14029.
Wang, T., Gong, S., Zhu, X., & Wang, S. (2016). Person re-identification by discriminative selection in video ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(12), 2501–2514.
Wang, T., Liu, H., Song, P., Guo, T., & Shi, W. (2022). Pose-guided feature disentangling for occluded person re-identification based on transformer. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 36(3), 2540–2549.
Wang, L., Tan, T., Ning, H., & Hu, W. (2003). Silhouette analysis-based gait recognition for human identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12), 1505–1518.
Wang, C., Zhang, J., Wang, L., Pu, J., & Yuan, X. (2011). Human identification using temporal information preserving gait template. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2164–2176.
Wei, L., Zhang, S., Yao, H., Gao, W., & Tian, Q. (2017). GLAD: Global-local-alignment descriptor for pedestrian retrieval. In: Proceedings of the 25th ACM international conference on Multimedia, pp. 420–428.
Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(2), 207–244.
Winter, D. A. (2009). Biomechanics and motor control of human movement. John Wiley & Sons
Wu, Z., Xiong, Y., Yu, S. X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3733–3742.
Wu, A., Zheng, W.-S., Gong, S., & Lai, J. (2020). RGB-IR person re-identification by cross-modality similarity preservation. International Journal of Computer Vision, 128(6), 1765–1785.
Xiao, T., Liu, S., De Mello, S., Yu, Z., Kautz, J., & Yang, M.-H. (2022). Learning contrastive representation for semantic correspondence. International Journal of Computer Vision, 130(5), 1293–1309.
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., & Hoi, S. C. (2021). Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6), 2872–2893.
Yoo, J.-H., Nixon, M. S., & Harris, C. J. (2002). Extracting gait signatures based on anatomical knowledge. In: Proceedings of BMVA symposium on advancing biometric technologies. Citeseer, pp. 596–606.
Yu, S., Tan, D., & Tan, T. (2006). A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In: International Conference on Pattern Recognition (ICPR), vol. 4. IEEE, pp. 441–444.
Yu, H.-X., Wu, A., & Zheng, W.-S. (2020). Unsupervised person re-identification by deep asymmetric metric embedding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 956–973.
Zhang, Z., Lan, C., Zeng, W., & Chen, Z. (2019). Densely semantically aligned person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 667–676.
Zhang, S., Liu, M., Yan, J., Zhang, H., Huang, L., Yang, X., & Lu, P. (2022). M-mix: Generating hard negatives via multi-sample mixing for contrastive learning. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pp. 2461–2470.
Zhao, R., Oyang, W., & Wang, X. (2017). Person re-identification by saliency learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(2), 356–370.
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 1116–1124.
Zheng, W.-S., Gong, S., & Xiang, T. (2015). Towards open-world person re-identification by one-shot group-based verification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3), 591–606.
Zhou, J. T., Pan, S. J., & Tsang, I. W. (2019). A deep learning framework for hybrid heterogeneous transfer learning. Artificial Intelligence, 275, 310–328.
Funding
This research is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG2-PhD/2022-01-034[T]).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare they have no conflict of interest.
Ethical Statements
The datasets used in our work are officially shared by reliable research agencies, which guarantee that the collecting, processing, releasing, and using of data have gained the formal consent of participants. To protect privacy, all individuals are anonymized with simple identity numbers. Our models and codes must only be used for legitimate research.
Additional information
Communicated by Yasushi Yagi.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rao, H., Leung, C. & Miao, C. Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification. Int J Comput Vis 132, 238–260 (2024). https://doi.org/10.1007/s11263-023-01864-0
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-023-01864-0