这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Transformer for Object Re-identification: A Survey

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Object Re-identification (Re-ID) aims to identify specific objects across different times and scenes, which is a widely researched task in computer vision. For a prolonged period, this field has been predominantly driven by deep learning technology based on convolutional neural networks. In recent years, the emergence of Vision Transformers has spurred a growing number of studies delving deeper into Transformer-based Re-ID, continuously breaking performance records and witnessing significant progress in the Re-ID field. Offering a powerful, flexible, and unified solution, Transformers cater to a wide array of Re-ID tasks with unparalleled efficacy. This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID. In categorizing existing works into Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages demonstrated by the Transformer in addressing a multitude of challenges across these domains. Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID, achieving state-of-the-art performance on both single/cross modal tasks. For the under-explored animal Re-ID, we devise a standardized experimental benchmark and conduct extensive experiments to explore the applicability of Transformer for this task and facilitate future research. Finally, we discuss some important yet under-investigated open issues in the large foundation model era, we believe it will serve as a new handbook for researchers in this field. A periodically updated website will be available at https://github.com/mangye16/ReID-Survey.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The authors declare that publicly available datasets are utilized for evaluating object Re-ID methods. The data supporting the experiments conducted in this study can be found in the paper. All publicly available datasets about person-related data used in this study are obtained and utilized following appropriate permissions and ethical guidelines.

References

  • (2022). Beluga id 2022. https://lila.science/datasets/beluga-id-2022/

  • (2022). Hyena id 2022. https://lila.science/datasets/hyena-id-2022/

  • (2022). Leopard id 2022. https://lila.science/datasets/leopard-id-2022/

  • Ahmed, E., Jones, M., & Marks, T. K. (2015). An improved deep learning architecture for person re-identification. In CVPR (pp. 3908–3916).

  • Bai. Y., Jiao, J., Ce, W., Liu, J., Lou, Y., Feng, X., & Duan, L. Y. (2021a). Person30k: A dual-meta generalization network for person re-identification. In CVPR (pp. 2123–2132).

  • Bai, Z., Wang, Z., Wang, J., Hu, D., & Ding, E. (2021b). Unsupervised multi-source domain adaptation for person re-identification. In CVPR (pp. 12914–12923).

  • Bergamini, L., Porrello, A., Dondona, A. C., Del Negro, E., Mattioli, M., D’alterio, N., & Calderara, S. (2018). Multi-views embedding for cattle re-identification. In IEEE SITIS (pp. 184–191).

  • Bouma, S., Pawley, M. D., Hupman, K., & Gilman, A. (2018). Individual common dolphin identification via metric embedding learning. In IEEE IVCNZ (pp. 1–6).

  • Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. NeurIPS, 33, 1877–1901.

    Google Scholar 

  • Bruslund Haurum, J., Karpova, A., Pedersen, M., Hein Bengtson, S., & Moeslund, T. B. (2020). Re-identification of zebrafish using metric learning. In WACV workshop (pp. 1–11).

  • Cao, J., Pang, Y., Anwer, R. M., Cholakkal, H., Xie, J., Shah, M., & Khan, F. S. (2022). Pstr: End-to-end one-step person search with transformers. In CVPR (pp. 9458–9467).

  • Cao, M., Bai, Y., Zeng, Z., Ye, M., & Zhang, M. (2024). An empirical study of clip for text-based person search. AAAI, 38, 465–473.

    Google Scholar 

  • Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., & Joulin, A. (2021). Emerging properties in self-supervised vision transformers. In ICCV (pp. 9650–9660).

  • Chan, J., Carrión, H., Mégret, R., Agosto-Rivera, J. L., & Giray, T. (2022). Honeybee re-identification in video: New datasets and impact of self-supervision. In VISIGRAPP (5: VISAPP) (pp. 517–525).

  • Cheeseman, T., Southerland, K., Park, J., Olio, M., Flynn, K., Calambokidis, J., Jones, L., Garrigue, C., Frisch Jordan, A., Howard, A., et al. (2022). Advanced image recognition: A fully automated, high-accuracy photo-identification matching system for humpback whales. Mammalian Biology, 102(3), 915–929.

    Google Scholar 

  • Chen, B., Deng, W., & Hu, J. (2019). Mixed high-order attention network for person re-identification. In ICCV (pp 371–381).

  • Chen, C., Ye, M., Qi, M., & Du, B. (2022a). Sketch transformer: Asymmetrical disentanglement learning from dynamic synthesis. In ACM MM (pp. 4012–4020).

  • Chen, C., Ye, M., Qi, M., Wu, J., Jiang, J., & Lin, C. W. (2022). Structure-aware positional transformer for visible-infrared person re-identification. IEEE TIP, 31, 2352–2364.

    Google Scholar 

  • Chen, C., Ye, M., & Jiang, D. (2023a). Towards modality-agnostic person re-identification with descriptive query. In CVPR (pp. 15128–15137).

  • Chen, H., Lagadec, B., & Bremond, F. (2021a). Ice: Inter-instance contrastive encoding for unsupervised person re-identification. In ICCV (pp. 14960–14969).

  • Chen, S., Ye, M., & Du, B. (2022c). Rotation invariant transformer for recognizing object in UAVs. In ACM MM (pp. 2565–2574).

  • Chen, W., Xu, X., Jia, J., Luo, H., Wang, Y., Wang, F., Jin, R., & Sun, X. (2023b). Beyond appearance: A semantic controllable self-supervised learning framework for human-centric visual tasks. In CVPR (pp. 15050–15061).

  • Chen, X., Xu, C., Cao, Q., Xu, J., Zhong, Y., Xu, J., Li, Z., Wang, J., & Gao, S. (2021b). Oh-former: Omni-relational high-order transformer for person re-identification. arXiv preprint arXiv:2109.11159

  • Chen, Y. C., Zhu, X., Zheng, W. S., & Lai, J. H. (2017). Person re-identification by camera correlation aware feature augmentation. IEEE TPAMI, 40(2), 392–408.

    Google Scholar 

  • Cheng, D., Zhou, J., Wang, N., & Gao, X. (2022). Hybrid dynamic contrast and probability distillation for unsupervised person re-id. IEEE TIP, 31, 3334–3346.

    Google Scholar 

  • Cheng, D., Huang, X., Wang, N., He, L., Li, Z., & Gao, X. (2023a). Unsupervised visible-infrared person reid by collaborative learning with neighbor-guided label refinement. In ACM MM (pp. 7085–7093).

  • Cheng, D., Wang, G., Wang, B., Zhang, Q., Han, J., & Zhang, D. (2023). Hybrid routing transformer for zero-shot learning. Pattern Recognition, 137, 109270.

    Google Scholar 

  • Cheng, D., Wang, G., Wang, N., Zhang, D., Zhang, Q., & Gao, X. (2023). Discriminative and robust attribute alignment for zero-shot learning. IEEE TCSVT, 33(8), 4244–4256.

    Google Scholar 

  • Cheng, D., Li, Y., Zhang, D., Wang, N., Sun, J., & Gao, X. (2024). Progressive negative enhancing contrastive learning for image dehazing and beyond. In IEEE TMM.

  • Cheng, X., Jia, M., Wang, Q., & Zhang, J. (2022b). More is better: Multi-source dynamic parsing attention for occluded person re-identification. In ACM MM (pp. 6840–6849).

  • Cho, Y., Kim, W. J., Hong, S., & Yoon, S. E. (2022). Part-based pseudo label refinement for unsupervised person re-identification. In CVPR (pp. 7308–7318).

  • Choi, S., Kim, T., Jeong, M., Park, H., & Kim, C. (2021). Meta batch-instance normalization for generalizable person re-identification. In CVPR (pp. 3425–3435).

  • Ci, Y., Wang, Y., Chen, M., Tang, S., Bai, L., Zhu, F., Zhao, R., Yu, F., Qi, D., & Ouyang, W. (2023). Unihcp: A unified model for human-centric perceptions. In CVPR (pp. 17840–17852).

  • Comandur, B. (2022). Sports re-id: Improving re-identification of players in broadcast videos of team sports. arXiv preprint arXiv:2206.02373

  • Dai, Y., Liu, J., Sun, Y., Tong, Z., Zhang, C., & Duan, L. Y. (2021). Idm: An intermediate domain module for domain adaptive person re-id. In ICCV (pp. 11864–11874).

  • Dai, Z., Wang, G., Yuan, W., Zhu, S., & Tan, P. (2022). Cluster contrast for unsupervised person re-identification. In ACCV (pp. 1142–1160).

  • Dehghani, M., Djolonga, J., Mustafa, B., Padlewski, P., Heek, J., Gilmer, J., Steiner, A. P., Caron, M., Geirhos, R., & Alabdulmohsin, I., et al. (2023). Scaling vision transformers to 22 billion parameters. In ICML (pp. 7480–7512). PMLR.

  • Deng, W., Zheng, L., Ye, Q., Kang, G., Yang, Y., & Jiao, J. (2018). Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In CVPR (pp. 994–1003).

  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  • Ding, N., Qin, Y., Yang, G., Wei, F., Yang, Z., Su, Y., Hu, S., Chen, Y., Chan, C. M., Chen, W., et al. (2023). Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3), 220–235.

    Google Scholar 

  • Ding, Z., Ding, C., Shao, Z., & Tao, D. (2021). Semantically self-aligned network for text-to-image part-aware person re-identification. arXiv preprint arXiv:2107.12666

  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR.

  • Fan, L., Li, T., Fang, R., Hristov, R., Yuan, Y., & Katabi, D. (2020). Learning longterm representations for person re-identification using radio signals. In CVPR (pp. 10699–10709).

  • Farooq, A., Awais, M., Kittler, J., & Khalid, S. S. (2022). Axm-net: Implicit cross-modal feature alignment for person re-identification. AAAI, 36, 4477–4485.

    Google Scholar 

  • Feng, Y., Yu, J., Chen, F., Ji, Y., Wu, F., Liu, S., & Jing, X. Y. (2022). Visible-infrared person re-identification via cross-modality interaction transformer. In IEEE TMM.

  • Ferdous, S. N., Li, X., & Lyu, S. (2022). Uncertainty aware multitask pyramid vision transformer for uav-based object re-identification. In ICIP (pp. 2381–2385). IEEE.

  • Fu, D., Chen, D., Bao, J., Yang, H., Yuan, L., Zhang, L., Li, H., & Chen, D. (2021). Unsupervised pre-training for person re-identification. In CVPR (pp. 14750–14759).

  • Gao, J., Burghardt, T., Andrew, W., Dowsey, A. W., & Campbell, N. W. (2021). Towards self-supervision for video identification of individual holstein-friesian cattle: The cows2021 dataset. arXiv preprint arXiv:2105.01938

  • Ge, Y., Zhu, F., Chen, D., Zhao, R., et al. (2020). Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. NeurIPS, 33, 11309–11321.

    Google Scholar 

  • Gray, D., Brennan, S., & Tao, H. (2007). Evaluating appearance models for recognition, reacquisition, and tracking. PETS, 3, 1–7.

    Google Scholar 

  • Gu, J., Luo, H., Wang, K., Jiang, W., You, Y., & Zhao, J. (2023). Color prompting for data-free continual unsupervised domain adaptive person re-identification. arXiv preprint arXiv:2308.10716

  • Guo, H., Zhu, K., Tang, M., & Wang, J. (2019). Two-level attention network with multi-grain ranking loss for vehicle re-identification. IEEE TIP, 28(9), 4328–4338.

    MathSciNet  Google Scholar 

  • Guo, P., Liu, H., Wu, J., Wang, G., & Wang, T. (2023). Semantic-aware consistency network for cloth-changing person re-identification. arXiv preprint arXiv:2308.14113

  • Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y., et al. (2022). A survey on vision transformer. IEEE TPAMI, 45(1), 87–110.

    Google Scholar 

  • Han, X., He, S., Zhang, L., & Xiang, T. (2021). Text-based person search with limited data. arXiv:2110.10807

  • He, B., Li, J., Zhao, Y., & Tian, Y. (2019). Part-regularized near-duplicate vehicle re-identification. In CVPR (pp. 3997–4005).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).

  • He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. In CVPR (pp. 16000–16009).

  • He, S., Luo, H., Wang, P., Wang, F., Li, H., & Jiang, W. (2021a). Transreid: Transformer-based object re-identification. In ICCV (pp. 15013–15022).

  • He, S., Chen, W., Wang, K., Luo, H., Wang, F., Jiang, W., & Ding, H. (2023a). Region generation and assessment network for occluded person re-identification. In IEEE TIFS.

  • He, S., Luo, H., Jiang, W., Jiang, X., & Ding, H. (2023). Vgsg: Vision-guided semantic-group network for text-based person search. IEEE TIP, 33, 163–176.

    Google Scholar 

  • He, T., Jin, X., Shen, X., Huang, J., Chen, Z., Hua, X. S. (2021b). Dense interaction learning for video-based person re-identification. In ICCV (pp. 1490–1501).

  • He, T., Shen, X., Huang, J., Chen, Z., & Hua, X. S. (2021c). Partial person re-identification with part-part correspondence learning. In CVPR (pp. 9105–9115).

  • He, W., Deng, Y., Tang, S., Chen, Q., Xie, Q., Wang, Y., Bai, L., Zhu, F., Zhao, R., & Ouyang, W., et al. (2024). Instruct-reid: A multi-purpose person re-identification task with instructions. In CVPR (pp. 17521–17531).

  • Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737

  • Hong, P., Wu, T., Wu, A., Han, X., & Zheng, W. S. (2021). Fine-grained shape-appearance mutual learning for cloth-changing person re-identification. In CVPR (pp. 10513–10522).

  • Howard, A., Ken, I., Southerland Holbrook. R., & Cheeseman, T. (2022). Happywhale - whale and dolphin identification. https://kaggle.com/competitions/happy-whale-and-dolphin

  • Jia, M., Cheng, X., Lu, S., & Zhang, J. (2022). Learning disentangled representation implicitly via transformer for occluded person re-identification. IEEE TMM, 25, 1294–1305.

    Google Scholar 

  • Jia, X., Zhong, X., Ye, M., Liu, W., & Huang, W. (2022). Complementary data augmentation for cloth-changing person re-identification. IEEE TIP, 31, 4227–4239.

    Google Scholar 

  • Jiang, D., & Ye, M. (2023). Cross-modal implicit relation reasoning and aligning for text-to-image person retrieval. In CVPR (pp. 2787–2797).

  • Jiang, K., Zhang, T., Liu, X., Qian, B., Zhang, Y., & Wu, F. (2022). Cross-modality transformer for visible-infrared person re-identification. In ECCV (pp. 480–496). Springer.

  • Jiao, B., Liu, L., Gao, L., Wu, R., Lin, G., Wang, P., & Zhang, Y. (2023). Toward re-identifying any animal. In NeurIPS.

  • Jin, X., Lan, C., Zeng, W., Chen, Z., & Zhang, L. (2020). Style normalization and restitution for generalizable person re-identification. In CVPR (pp. 3143–3152).

  • Jin, X., He, T., Zheng, K., Yin, Z., Shen, X., Huang, Z., Feng, R., Huang, J., Chen, Z., & Hua, X. S. (2022). Cloth-changing person re-identification from a single image with gait prediction and regularization. In CVPR (pp. 14278–14287).

  • Kalayeh, M. M., Basaran, E., Gökmen, M., Kamasak, M. E., & Shah, M. (2018). Human semantic parsing for person re-identification. In CVPR (pp. 1062–1071).

  • Khan, S. D., & Ullah, H. (2019). A survey of advances in vision-based vehicle re-identification. CVIU, 182, 50–63.

    Google Scholar 

  • Khorramshahi, P., Kumar, A., Peri, N., Rambhatla, S. S., Chen, J. C., & Chellappa, R. (2019). A dual-path model with adaptive attention for vehicle re-identification. In ICCV (pp. 6132–6141).

  • Koch, G., Zemel, R., & Salakhutdinov, R., et al. (2015). Siamese neural networks for one-shot image recognition. In ICML workshop (vol. 2). Lille.

  • Konovalov, D. A., Hillcoat, S., Williams, G., Birtles, R. A., Gardiner, N., & Curnock, M. I. (2018). Individual minke whale recognition using deep learning convolutional neural networks. Journal of Geoscience and Environment Protection, 6, 25–36.

    Google Scholar 

  • Korschens, M., & Denzler, J. (2019). Elpephants: A fine-grained dataset for elephant re-identification. In ICCV workshop.

  • Kumar, S., Yaghoubi, E., Das, A., Harish, B., & Proença, H. (2020). The p-destre: A fully annotated dataset for pedestrian detection, tracking, re-identification and search from aerial devices. arXiv preprint arXiv:2004.02782

  • Kuncheva, L. I., Williams, F., Hennessey, S. L., & Rodríguez, J. J. (2022). A benchmark database for animal re-identification and tracking. In IEEE IPAS (pp. 1–6). IEEE.

  • Lai, S., Chai, Z., & Wei, X. (2021). Transformer meets part model: Adaptive part division for person re-identification. In ICCV (pp. 4150–4157).

  • Lee, K. W., Jawade, B., Mohan, D., Setlur, S., & Govindaraju, V. (2022). Attribute de-biased vision transformer (ad-vit) for long-term person re-identification. In IEEE AVSS (pp. 1–8) . IEEE.

  • Li, H., Li, C., Zhu, X., Zheng, A., & Luo, B. (2020). Multi-spectral vehicle re-identification: A challenge. AAAI, 34, 11345–11353.

    Google Scholar 

  • Li, H., Wu, G., & Zheng, W. S. (2021a). Combined depth space based architecture search for person re-identification. In CVPR (pp. 6729–6738).

  • Li, H., Ye, M., & Du, B. (2021b). Weperson: Learning a generalized re-identification model from all-weather virtual data. In ACM MM (pp. 3115–3123).

  • Li, H., Li, C., Zheng, A., Tang, J., & Luo, B. (2022). Mskat: Multi-scale knowledge-aware transformer for vehicle re-identification. IEEE TITS, 23(10), 19557–19568.

    Google Scholar 

  • Li, H., Ye, M., Wang, C., & Du, B. (2022b). Pyramidal transformer with conv-patchify for person re-identification. In ACM MM (pp. 7317–7326).

  • Li, H., Ye, M., Zhang, M., Du, B. (2024a). All in one framework for multimodal re-identification in the wild. In CVPR (pp. 17459–17469).

  • Li, M., Zhu, X., & Gong, S. (2019). Unsupervised tracklet person re-identification. IEEE TPAMI, 42(7), 1770–1782.

    Google Scholar 

  • Li, S., Xiao, T., Li, H., Zhou, B., Yue, D., & Wang, X. (2017). Person search with natural language description. In CVPR (pp. 1970–1979).

  • Li, S., Li, J., Tang, H., Qian, R., & Lin, W. (2019b). Atrw: A benchmark for amur tiger re-identification in the wild. arXiv preprint arXiv:1906.05586

  • Li, S., Fu, L., Sun, Y., Mu, Y., Chen, L., Li, J., & Gong, H. (2021). Individual dairy cow identification based on lightweight convolutional neural network. Plos one, 16(11), e0260510.

    Google Scholar 

  • Li, S., Sun, L., & Li, Q. (2023). Clip-reid: Exploiting vision-language model for image re-identification without concrete text labels. AAAI, 37, 1405–1413.

    Google Scholar 

  • Li, T., Liu, J., Zhang, W., Ni, Y., Wang, W., & Li, Z. (2021d). Uav-human: A large benchmark for human behavior understanding with unmanned aerial vehicles. In CVPR (pp. 16266–16275).

  • Li, W., Zhao, R., Xiao, T., & Wang, X. (2014). Deepreid: Deep filter pairing neural network for person re-identification. In CVPR (pp. 152–159).

  • Li, W., Zhu, X., & Gong, S. (2018). Harmonious attention network for person re-identification. In CVPR (pp. 2285–2294).

  • Li, W., Zou, C., Wang, M., Xu, F., Zhao, J., Zheng, R., Cheng, Y., & Chu, W. (2023b). Dc-former: Diverse and compact transformer for person re-identification. arXiv preprint arXiv:2302.14335

  • Li, Y., He, J., Zhang, T., Liu, X., Zhang, Y., & Wu, F. (2021e). Diverse part discovery: Occluded person re-identification with part-aware transformer. In CVPR (pp. 2898–2907).

  • Li, Y., Liu, Y., Zhang, H., Zhao, C., Wei, Z., & Miao, D. (2024b). Occlusion-aware transformer with second-order attention for person re-identification. IEEE TIP

  • Liang, T., Jin, Y., Liu, W., & Li, Y. (2023). Cross-modality transformer with modality mining for visible-infrared person re-identification. IEEE TMM

  • Liao, S., & Shao, L. (2021). Transmatcher: Deep image matching through transformers for generalizable person re-identification. NeurIPS, 34, 1992–2003.

    Google Scholar 

  • Liao, S., Hu, Y., Zhu, X., & Li, S. Z. (2015). Person re-identification by local maximal occurrence representation and metric learning. In CVPR (pp. 2197–2206).

  • Lin, W., Li, Y., Xiao, H., See, J., Zou, J., Xiong, H., Wang, J., & Mei, T. (2019). Group reidentification with multigrained matching and integration. IEEE transactions on cybernetics, 51(3), 1478–1492.

    Google Scholar 

  • Lin, Y., Dong, X., Zheng, L., Yan, Y., & Yang, Y. (2019). A bottom-up clustering approach to unsupervised person re-identification. AAAI, 33, 8738–8745.

    Google Scholar 

  • Lin, Y., Xie, L., Wu, Y., Yan, C., & Tian, Q. (2020). Unsupervised person re-identification via softened similarity learning. In CVPR (pp. 3390–3399).

  • Liu, F., Ye, M., & Du, B. (2023a). Dual level adaptive weighting for cloth-changing person re-identification. IEEE TIP

  • Liu, H., Jie, Z., Jayashree, K., Qi, M., Jiang, J., Yan, S., & Feng, J. (2017). Video-based person re-identification with accumulative motion context. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2788–2802.

    Google Scholar 

  • Liu, X., Liu, W., Ma, H., & Fu, H. (2016a). Large-scale vehicle re-identification in urban surveillance videos. In ICME (pp. 1–6). IEEE.

  • Liu, X., Liu, W., Mei, T., & Ma, H. (2016b). A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In ECCV (pp. 869–884). Springer.

  • Liu, X., Zhang, P., Yu, C., Lu, H., Qian, X., & Yang, X. (2021a). A video is worth three views: Trigeminal transformers for video-based person re-identification. arXiv preprint arXiv:2104.01745

  • Liu, X., Yu, C., Zhang, P., & Lu, H. (2023b). Deeply coupled convolution–transformer with spatial–temporal complementary learning for video-based person re-identification. In IEEE TNNLS.

  • Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021b). Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030

  • Lou, Y., Bai, Y., Liu, J., Wang, S., & Duan, L. (2019). Veri-wild: A large dataset and a new method for vehicle re-identification in the wild. In CVPR (pp. 3235–3243).

  • Luo, H., Jiang, W., Gu, Y., Liu, F., Liao, X., Lai, S., & Gu, J. (2019). A strong baseline and batch normalization neck for deep person re-identification. IEEE TMM, 22(10), 2597–2609.

    Google Scholar 

  • Luo, H., Wang, P., Xu, Y., Ding, F., Zhou, Y., Wang, F., Li, H., & Jin, R. (2021). Self-supervised pre-training for transformer-based person re-identification. arXiv preprint arXiv:2111.12084

  • Mallat, S. G. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE TPAMI, 11(7), 674–693.

    Google Scholar 

  • Mao, J., Yao, Y., Sun, Z., Huang, X., Shen, F., & Shen, H. T. (2023). Attention map guided transformer pruning for occluded person re-identification on edge device. In IEEE TMM.

  • McLaughlin, N., Del Rincon, J. M., & Miller, P. (2016). Recurrent convolutional network for video-based person re-identification. In CVPR (pp. 1325–1334).

  • Meng, D., Li, L., Liu, X., Li, Y., Yang, S., Zha, Z. J., Gao, X., Wang, S., Huang, Q. (2020). Parsing-based view-aware embedding network for vehicle re-identification. In CVPR (pp. 7103–7112).

  • Miao, J., Wu, Y., Liu, P., Ding, Y., & Yang, Y. (2019). Pose-guided feature alignment for occluded person re-identification. In ICCV (pp. 542–551).

  • Moskvyak, O., Maire, F., Dayoub, F., & Baktashmotlagh, M. (2020). Learning landmark guided embeddings for animal re-identification. In WACV workshop (pp. 12–19).

  • Moskvyak, O., Maire, F., Dayoub, F., Armstrong, A. O., & Baktashmotlagh, M. (2021). Robust re-identification of manta rays from natural markings by learning pose invariant embeddings. In DICTA (pp. 1–8). IEEE.

  • Naseer, M., Ranasinghe, K., Khan, S., Hayat, M., Khan, F. S., & Yang, M. H. (2021). Intriguing properties of vision transformers. arXiv preprint arXiv:2105.10497

  • Nepovinnykh, E., Eerola, T., & Kalviainen, H. (2020). Siamese network based pelage pattern matching for ringed seal re-identification. In WACV workshop (pp. 25–34).

  • Nepovinnykh, E., Eerola, T., Biard, V., Mutka, P., Niemi, M., Kunnasranta, M., & Kälviäinen, H. (2022). Sealid: Saimaa ringed seal re-identification dataset. Sensors, 22(19), 7602.

    Google Scholar 

  • Nguyen, D. T., Hong, H. G., Kim, K. W., & Park, K. R. (2017). Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors, 17(3), 605.

    Google Scholar 

  • Ni, H., Song, J., Luo, X., Zheng, F., Li, W., & Shen, H. T. (2022). Meta distribution alignment for generalizable person re-identification. In CVPR (pp. 2487–2496).

  • Ni, H., Li, Y., Gao, L., Shen, H. T., & Song, J. (2023). Part-aware transformer for generalizable person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 11280–11289).

  • Niu, K., Huang, Y., Ouyang, W., & Wang, L. (2020). Improving description-based person re-identification by multi-granularity image-text alignments. In IEEE TIP (pp. 5542–5556).

  • Organisciak, D., Poyser, M., Alsehaim, A., Hu, S., Isaac-Medina, B. K., Breckon, T. P., Shum, H. P. (2021). Uav-reid: A benchmark on unmanned aerial vehicle re-identification in video imagery. arXiv preprint arXiv:2104.06219

  • Pang, L., Wang, Y., Song, Y. Z., Huang, T., Tian, Y. (2018). Cross-domain adversarial feature learning for sketch re-identification. In ACM MM (pp. 609–617).

  • Papafitsoros, K., Adam, L., Čermák, V., & Picek, L. (2022). Seaturtleid: A novel long-span dataset highlighting the importance of timestamps in wildlife re-identification. arXiv preprint arXiv:2211.10307

  • Parham, J., Crall, J., Stewart, C., Berger-Wolf, T., Rubenstein, D. I. (2017). Animal population censusing at scale with citizen science and photographic identification. In AAAI.

  • Park, H., & Ham, B. (2020). Relation network for person re-identification. AAAI, 34, 11839–11847.

    Google Scholar 

  • Porrello, A., Bergamini, L., & Calderara, S. (2020). Robust re-identification by multiple views knowledge distillation. In ECCV (pp. 93–110). Springer.

  • Pu, N., Zhong, Z., Sebe, N., Lew, M. S. (2023). A memorizing and generalizing framework for lifelong person re-identification. In IEEE TPAMI

  • Qian, W., Luo, H., Peng, S., Wang, F., Chen, C., & Li, H. (2022). Unstructured feature decoupling for vehicle re-identification. In ECCV (pp. 336–353).

  • Qian, X., Wang, W., Zhang, L., Zhu, F., Fu, Y., Xiang, T., Jiang, Y. G., & Xue, X. (2020). Long-term cloth-changing person re-identification. In ACCV.

  • Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In ICML (pp. 8748–8763). PMLR.

  • Rao, H., & Miao, C. (2023). Transg: Transformer-based skeleton graph prototype contrastive learning with structure-trajectory prompted reconstruction for person re-identification. In CVPR (pp. 22118–22128).

  • Rao, H., Wang, S., Hu, X., Tan, M., Guo, Y., Cheng, J., Liu, X., & Hu, B. (2021). A self-supervised gait encoding approach with locality-awareness for 3d skeleton based person re-identification. IEEE TPAMI, 44(10), 6649–6666.

    Google Scholar 

  • Rao, H., Leung, C., & Miao, C. (2024). Hierarchical skeleton meta-prototype contrastive learning with hard skeleton mining for unsupervised person re-identification. IJCV, 132(1), 238–260.

    Google Scholar 

  • Sarafianos, N., Xu, X., & Kakadiaris, I. A. (2019). Adversarial representation learning for text-to-image matching. In ICCV (pp. 5814–5824).

  • Schneider, S., Taylor, G. W., Linquist, S., & Kremer, S. C. (2019). Past, present and future approaches using computer vision for animal re-identification from camera trap data. Methods in Ecology and Evolution, 10(4), 461–470.

    Google Scholar 

  • Shao, Z., Zhang, X., Fang, M., Lin, Z., Wang, J., & Ding, C. (2022). Learning granularity-unified representations for text-to-image person re-identification. In ACM MM (pp. 5566–5574).

  • Shao, Z., Zhang, X., Ding, C., Wang, J., & Wang, J. (2023). Unified pre-training with pseudo texts for text-to-image person re-identification. In ICCV (pp. 11174–11184).

  • Shen, F., Xie, Y., Zhu, J., Zhu, X., & Zeng, H. (2023). Git: Graph interactive transformer for vehicle re-identification. IEEE TIP, 32, 1039–1051.

    Google Scholar 

  • Shen, L., He, T., Guo, Y., & Ding, G. (2023b). X-reid: Cross-instance transformer for identity-level person re-identification. arXiv preprint arXiv:2302.02075

  • Shu, X., Wen, W., Wu, H., Chen, K., Song, Y., Qiao, R., Ren, B., & Wang, X. (2022). See finer, see more: Implicit modality alignment for text-based person retrieval. In ECCV (pp. 624–641). Springer.

  • Song, G., Leng, B., Liu, Y., Hetang, C., & Cai, S. (2018). Region-based quality estimation network for large-scale person re-identification. In AAAI (vol. 32).

  • Su, C., Li, J., Zhang, S., Xing, J., Gao, W., & Tian, Q. (2017). Pose-driven deep convolutional model for person re-identification. In ICCV (pp. 3960–3969).

  • Suh, Y., Wang, J., Tang, S., Mei, T., & Lee, K. M. (2018). Part-aligned bilinear representations for person re-identification. In ECCV (pp. 402–419).

  • Sun, C. C., Arr, G. S., Ramachandran, R. P., & Ritchie, S. G. (2004). Vehicle reidentification using multidetector fusion. IEEE TITS, 5(3), 155–164.

    Google Scholar 

  • Sun, X., & Zheng, L. (2019). Dissecting person re-identification from the viewpoint of viewpoint. In CVPR (pp. 608–617).

  • Sun, Y., Zheng, L., Yang, Y., Tian, Q., & Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In ECCV (pp. 480–496)

  • Tan, B., Xu, L., Qiu, Z., Wu, Q., & Meng, F. (2023). Mfat: A multi-level feature aggregated transformer for person re-identification. In ICASSP (pp. 1–5). IEEE.

  • Tan, W., Ding, C., Jiang, J., Wang, F., Zhan, Y., & Tao, D. (2024). Harnessing the power of mllms for transferable text-to-image person reid. In CVPR (pp. 17127–17137).

  • Tang, S., Chen, C., Xie, Q., Chen, M., Wang, Y., Ci, Y., Bai, L., Zhu, F., Yang, H., & Yi, L., et al. (2023). Humanbench: Towards general human-centric perception with projector assisted pretraining. In CVPR (pp. 21970–21982).

  • Tang, Z., Naphade, M., Liu, M. Y., Yang, X., Birchfield, S., Wang, S., Kumar, R., Anastasiu, D., & Hwang, J. N. (2019). Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. In CVPR (pp. 8797–8806).

  • Tang, Z., Zhang, R., Peng, Z., Chen, J., & Lin, L. (2022). Multi-stage spatio-temporal aggregation transformer for video person re-identification. In IEEE TMM.

  • Teng, S., Zhang, S., Huang, Q., & Sebe, N. (2021). Viewpoint and scale consistency reinforcement for uav vehicle re-identification. IJCV, 129, 719–735.

    Google Scholar 

  • Tian, X., Liu, J., Zhang, Z., Wang, C., Qu, Y., Xie, Y., & Ma, L. (2022). Hierarchical walking transformer for object re-identification. In ACM MM (pp. 4224–4232).

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. NeurIPS 30

  • Walmer, M., Suri, S., Gupta, K., & Shrivastava, A. (2023). Teaching matters: Investigating the role of supervision in vision transformers. In CVPR (pp. 7486–7496).

  • Wang, D., & Zhang, S. (2020). Unsupervised person re-identification via multi-label classification. In CVPR (pp. 10981–10990).

  • Wang, G., Zhang, T., Cheng, J., Liu, S., Yang, Y., & Hou, Z. (2019a). Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment. In ICCV (pp. 3623–3632).

  • Wang, G., Yang, S., Liu, H., Wang, Z., Yang, Y., Wang, S., Yu, G., Zhou, E., & Sun, J. (2020a). High-order information matters: Learning relation and topology for occluded person re-identification. In CVPR (pp. 6449–6458).

  • Wang, G., Yu, F., Li, J., Jia, Q., & Ding, S. (2023a). Exploiting the textual potential from vision-language pre-training for text-based person search. arXiv preprint arXiv:2303.04497

  • Wang, G. A., Zhang, T., Yang, Y., Cheng, J., Chang, J., Liang, X., & Hou, Z. G. (2020). Cross-modality paired-images generation for rgb-infrared person re-identification. AAAI, 34, 12144–12151.

    Google Scholar 

  • Wang, H., Shen, J., Liu, Y., Gao, Y., & Gavves, E. (2022a). Nformer: Robust person re-identification with neighbor transformer. In CVPR (pp. 7297–7307).

  • Wang, J., Zhang, Z., Chen, M., Zhang, Y., Wang, C., Sheng, B., Qu, Y., & Xie, Y. (2022b). Optimal transport for label-efficient visible-infrared person re-identification. In ECCV (pp. 93–109). Springer.

  • Wang, L., Ding, R., Zhai, Y., Zhang, Q., Tang, W., Zheng, N., & Hua, G. (2021). Giant panda identification. IEEE TIP, 30, 2837–2849.

    Google Scholar 

  • Wang, P., Jiao, B., Yang, L., Yang, Y., Zhang, S., Wei, W., & Zhang, Y. (2019b). Vehicle re-identification in aerial imagery: Dataset and approach. In ICCV (pp. 460–469).

  • Wang, T., Liu, H., Song, P., Guo, T., & Shi, W. (2022). Pose-guided feature disentangling for occluded person re-identification based on transformer. AAAI, 36, 2540–2549.

    Google Scholar 

  • Wang, T., Liu, H., Li, W., Ban, M., Guo, T., & Li, Y. (2023b). Feature completion transformer for occluded person re-identification. arXiv preprint arXiv:2303.01656

  • Wang, W., Xie, E., Li, X., Fan, D. P., Song, K., Liang, D., Lu, T., Luo, P., & Shao, L. (2021b). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122

  • Wang, X., Wang, X., Jiang, B., & Luo, B. (2023c). Few-shot learning meets transformer: Unified query-support transformers for few-shot classification. In IEEE TCSVT

  • Wang, Y., Qi, G., Li, S., Chai, Y., & Li, H. (2022). Body part-level domain alignment for domain-adaptive person re-identification with transformer framework. IEEE TIFS, 17, 3321–3334.

    Google Scholar 

  • Wang, Z., Wang, Z., Zheng, Y., Chuang, Y. Y., & Satoh, S. (2019c). Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In CVPR (pp. 618–626).

  • Wang, Z., Wang, Z., Zheng, Y., Wu, Y., Zeng, W., & Satoh, S. (2019d). Beyond intra-modality: A survey of heterogeneous person re-identification. arXiv preprint arXiv:1905.10048

  • Wang, Z., Fang, Z., Wang, J., & Yang, Y. (2020c). Vitaa: Visual-textual attributes alignment in person search by natural language. In ECCV (pp. 402–420). Springer.

  • Wei, L., Zhang, S., Gao, W., & Tian, Q. (2018). Person transfer gan to bridge domain gap for person re-identification. In CVPR (pp. 79–88).

  • Wei, R., Gu, J., He, S., & Jiang, W. (2022). Transformer-based domain-specific representation for unsupervised domain adaptive vehicle re-identification. IEEE TITS, 24(3), 2935–2946.

    Google Scholar 

  • Weideman, H., Stewart, C., Parham, J., Holmberg, J., Flynn, K., Calambokidis, J., Paul, D. B., Bedetti, A., Henley M., & Pope F., et al. (2020). Extracting identifying contours for african elephants and humpback whales using a learned appearance model. In WACV (pp. 1276–1285).

  • Weideman, H. J., Jablons, Z. M., Holmberg, J., Flynn, K., Calambokidis, J., Tyson, R. B., Allen, J. B., Wells, R. S., Hupman, K., & Urian K., et al. (2017). Integral curvature representation and matching algorithms for identification of dolphins and whales. In ICCV workshop (pp. 2831–2839).

  • Wu, A., Zheng, W. S., Yu, H. X., Gong, S., & Lai, J. (2017). Rgb-infrared cross-modality person re-identification. In ICCV (pp. 5380–5389).

  • Wu, J., He, L., Liu, W., Yang, Y., Lei, Z., Mei, T., & Li, S. Z. (2022a). Cavit: Contextual alignment vision transformer for video object re-identification. In ECCV (pp. 549–566). Springer.

  • Wu, L., Liu, D., Zhang, W., Chen, D., Ge, Z., Boussaid, F., Bennamoun, M., & Shen, J. (2022). Pseudo-pair based self-similarity learning for unsupervised person re-identification. IEEE TIP, 31, 4803–4816.

    Google Scholar 

  • Wu, P., Wang, L., Zhou, S., Hua, G., & Sun, C. (2024). Temporal correlation vision transformer for video person re-identification. AAAI, 38, 6083–6091.

    Google Scholar 

  • Wu, Y., Yan, Z., Han, X., Li, G., Zou, C., & Cui, S. (2021). Lapscore: language-guided person search via color reasoning. In ICCV (pp. 1624–1633).

  • Wu, Z., & Ye, M. (2023). Unsupervised visible-infrared person re-identification via progressive graph matching and alternate learning. In CVPR (pp. 9548–9558).

  • Xiao, H., Lin, W., Sheng, B., Lu, K., Yan, J., Wang, J., Ding, E., & Zhang, Y., Xiong, H. (2018). Group re-identification: Leveraging and integrating multi-grain information. In ACM MM (pp. 192–200).

  • Xiao, T., Li, S., Wang, B., Lin, L., & Wang, X. (2017). Joint detection and identification feature learning for person search. In CVPR (pp. 3415–3424).

  • Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., & Hu, H. (2022). Simmim: A simple framework for masked image modeling. In CVPR (pp. 9653–9663).

  • Xu, B., He, L., Liang, J., & Sun, Z. (2022). Learning feature recovery transformer for occluded person re-identification. IEEE TIP, 31, 4651–4662.

    Google Scholar 

  • Xu, P., Zhu, X. (2023). Deepchange: A long-term person re-identification benchmark with clothes change. In ICCV (pp. 11196–11205).

  • Xu, P., Zhu, X., & Clifton, D. A. (2023). Multimodal learning with transformers: A survey. In IEEE TPAMI.

  • Xu, W., Liu, H., Shi, W., Miao, Z., Lu, Z., & Chen, F. (2021). Adversarial feature disentanglement for long-term person re-identification. In IJCAI (pp. 1201–1207).

  • Xuan, S., Zhang, S. (2021). Intra-inter camera similarity for unsupervised person re-identification. In CVPR (pp. 11926–11935).

  • Yan, K., Tian, Y., Wang, Y., Zeng, W., & Huang, T. (2017). Exploiting multi-grain ranking constraints for precisely searching visually-similar vehicles. In ICCV (pp. 562–570).

  • Yan, S., Dong, N., Zhang, L., & Tang, J. (2022). Clip-driven fine-grained text-image person re-identification. arXiv preprint arXiv:2210.10276

  • Yan, Y., Ni, B., Song, Z., Ma, C., Yan, Y., & Yang, X. (2016). Person re-identification via recurrent feature aggregation. In ECCV (pp. 701–716). Springer

  • Yan, Y., Qin, J., Ni, B., Chen, J., Liu, L., Zhu, F., Zheng, W. S., Yang, X., & Shao, L. (2020). Learning multi-attention context graph for group-based re-identification. IEEE TPAMI, 45(6), 7001–7018.

    Google Scholar 

  • Yang, B., Ye, M., Chen, J., & Wu, Z. (2022). Augmented dual-contrastive aggregation learning for unsupervised visible-infrared person re-identification. In ACM MM (pp. 2843–2851).

  • Yang, B., Chen, J., & Ye, M. (2023a). Top-k visual tokens transformer: Selecting tokens for visible-infrared person re-identification. In ICASSP (pp. 1–5). IEEE.

  • Yang, B., Chen, J., Ye, M. (2023b). Towards grand unified representation learning for unsupervised visible-infrared person re-identification. In ICCV (pp. 11069–11079).

  • Yang, Q., Wu, A., & Zheng, W. S. (2019). Person re-identification by contour sketch under moderate clothing change. IEEE TPAMI, 43(6), 2029–2046.

    Google Scholar 

  • Yang, S., Zhou, Y., Zheng, Z., Wang, Y., Zhu, L., & Wu, Y. (2023c). Towards unified text-based person retrieval: A large-scale multi-attribute and language search benchmark. In ACM MM (pp. 4492–4501).

  • Yang, Z., Wu, D., Wu, C., Lin, Z., Gu, J., & Wang, W. (2024). A pedestrian is worth one prompt: Towards language guidance person re-identification. In CVPR (pp. 17343–17353)

  • Yao, Y., Zheng, L., Yang, X., Naphade, M., & Gedeon, T. (2020). Simulating content consistent vehicle datasets with attribute descent. In ECCV (pp. 775–791). Springer.

  • Ye, M., Liang, C., Wang, Z., Leng, Q., Chen, J., & Liu, J. (2015). Specific person retrieval via incomplete text description. In ACM ICMRl (pp. 547–550).

  • Ye, M., Lan, X., Li, J., Yuen, P. (2018). Hierarchical discriminative learning for visible thermal person re-identification. In AAAI (vol. 32).

  • Ye, M., Cheng, Y., Lan, X., & Zhu, H. (2019). Improving night-time pedestrian retrieval with distribution alignment and contextual distance. IEEE TII, 16(1), 615–624.

    Google Scholar 

  • Ye, M., Lan, X., Wang, Z., & Yuen, P. C. (2019). Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE TIFS, 15, 407–419.

    Google Scholar 

  • Ye, M., Shen, J., & Shao, L. (2020). Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE TIFS, 16, 728–739.

    Google Scholar 

  • Ye, M., Shen, J., Zhang, X., Yuen, P. C., & Chang, S. F. (2020b). Augmentation invariant and instance spreading feature for softmax embedding. In IEEE TPAMI.

  • Ye, M., Li, H., Du, B., Shen, J., Shao, L., & Hoi, S. C. (2021). Collaborative refining for person re-identification with label noise. IEEE TIP, 31, 379–391.

    Google Scholar 

  • Ye, M., Ruan, W., Du, B., & Shou, M. Z. (2021b). Channel augmented joint learning for visible-infrared recognition. In ICCV (pp. 13567–13576).

  • Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., & Hoi, S. C. H. (2021c). Deep learning for person re-identification: A survey and outlook. In IEEE TPAMI (pp. 1–1).

  • Ye, M., Wu, Z., Chen, C., & Du, B. (2023). Channel augmentation for visible-infrared re-identification. IEEE TPAMI, 01, 1–16.

    Google Scholar 

  • Ye, Y., Zhou, H., Yu, J., Hu, Q., & Yang, W. (2022). Dynamic feature pruning and consolidation for occluded person re-identification. arXiv preprint arXiv:2211.14742

  • Yu, H. X., Zheng, W. S., Wu, A., Guo, X., Gong, S., & Lai, J. H. (2019). Unsupervised person re-identification by soft multilabel learning. In CVPR (pp. 2148–2157).

  • Yu, R., Du, D., LaLonde, R., Davila, D., Funk, C., Hoogs, A., & Clipp, B. (2022). Cascade transformers for end-to-end person search. In CVPR (pp. 7267–7276).

  • Zapletal, D., & Herout, A. (2016). Vehicle re-identification for automatic video traffic surveillance. In CVPR workshop (pp. 25–31).

  • Zhai, X., Kolesnikov, A., Houlsby, N., & Beyer, L. (2022a). Scaling vision transformers. In CVPR (pp. 12104–12113).

  • Zhai, Y., Zeng, Y., Cao, D., & Lu, S. (2022b). Trireid: Towards multi-modal person re-identification via descriptive fusion model. In ICMR (pp. 63–71).

  • Zhang, B., Liang, Y., & Du, M. (2022a). Interlaced perception for person re-identification based on swin transformer. In IEEE ICIVC (pp. 24–30).

  • Zhang, G., Zhang, P., Qi, J., & Lu, H. (2021a). Hat: Hierarchical aggregation transformers for person re-identification. In ACM MM (pp. 516–525).

  • Zhang, G., Zhang, Y., Zhang, T., Li, B., & Pu, S. (2023a). Pha: Patch-wise high-frequency augmentation for transformer-based person re-identification. In CVPR (pp. 14133–14142).

  • Zhang, Q., Lai, J. H., Feng, Z., & Xie, X. (2022). Uncertainty modeling with second-order transformer for group re-identification. AAAI, 36, 3318–3325.

    Google Scholar 

  • Zhang, Q., Wang, L., Patel, V. M., Xie, X., & Lai, J. (2024). View-decoupled transformer for person re-identification under aerial-ground camera network. In CVPR (pp. 22000–22009).

  • Zhang, S., Zhang, Q., Yang, Y., Wei, X., Wang, P., Jiao, B., & Zhang, Y. (2020). Person re-identification in aerial imagery. IEEE TMM, 23, 281–291.

    Google Scholar 

  • Zhang, S., Yang, Y., Wang, P., Liang, G., Zhang, X., & Zhang, Y. (2021). Attend to the difference: Cross-modality person re-identification via contrastive correlation. IEEE TIP, 30, 8861–8872.

    Google Scholar 

  • Zhang, T., Wei, L., Xie, L., Zhuang, Z., Zhang, Y., Li, B., & Tian, Q. (2021c). Spatiotemporal transformer for video-based person re-identification. arXiv preprint arXiv:2103.16469

  • Zhang, T., Xie, L., Wei, L., Zhuang, Z., Zhang, Y., Li, B., & Tian, Q. (2021d). Unrealperson: An adaptive pipeline towards costless person re-identification. In CVPR (pp. 11506–11515).

  • Zhang, T., Zhao, Q., Da, C., Zhou, L., Li, L., & Jiancuo, S. (2021e). Yakreid-103: A benchmark for yak re-identification. In IEEE IJCB (pp. 1–8). IEEE.

  • Zhang, X., Ge, Y., Qiao, Y., & Li, H. (2021f). Refining pseudo labels with clustering consensus over generations for unsupervised object re-identification. In CVPR (pp. 3436–3445).

  • Zhang, X., Li, D., Wang, Z., Wang, J., Ding, E., Shi, J. Q., Zhang, Z., & Wang, J. (2022c). Implicit sample extension for unsupervised person re-identification. In CVPR pp. 7369–7378.

  • Zhang, Y., & Lu, H. (2018). Deep cross-modal projection learning for image-text matching. In ECCV (pp. 686–701).

  • Zhang, Y., Wang, Y., Li, H., & Li, S. (2022d). Cross-compatible embedding and semantic consistent feature construction for sketch re-identification. In ACM MM (pp. 3347–3355).

  • Zhang, Y., Gong, K., Zhang, K., Li, H., Qiao, Y., Ouyang, W., & Yue, X. (2023b). Meta-transformer: A unified framework for multimodal learning. arXiv preprint arXiv:2307.10802

  • Zhang, Z., Lan, C., Zeng, W., Jin, X., & Chen, Z. (2020b). Relation-aware global attention for person re-identification. In CVPR (pp. 3186–3195).

  • Zhao, J., Wang, H., Zhou, Y., Yao, R., Chen, S., & El Saddik, A. (2022). Spatial-channel enhanced transformer for visible-infrared person re-identification. In IEEE TMM.

  • Zhao, Y., Zhong, Z., Yang, F., Luo, Z., Lin, Y., Li, S., & Sebe, N. (2021). Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification. In CVPR (pp. 6277–6286).

  • Zheng, K., Liu, W., He, L., Mei, T., Luo, J., & Zha, Z. J. (2021). Group-aware label transfer for domain adaptive person re-identification. In CVPR (pp. 5310–5319).

  • Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV (pp. 1116–1124).

  • Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., & Tian, Q. (2016a). Mars: A video benchmark for large-scale person re-identification. In ECCV (pp. 868–884). Springer.

  • Zheng, L., Yang, Y., & Hauptmann, A. G. (2016b). Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984

  • Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., & Tian, Q. (2017a). Person re-identification in the wild. In CVPR (pp. 1367–1376).

  • Zheng, W., Gong, S., & Xiang, T. (2009). Associating groups of people. In BMVC (pp. 1–11).

  • Zheng, Z., Zheng, L., & Yang, Y. (2017). A discriminatively learned cnn embedding for person reidentification. ACM TOMM, 14(1), 1–20.

    Google Scholar 

  • Zheng, Z., Zheng, L., & Yang, Y. (2017c). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV (pp. 3754–3762).

  • Zhong, Z., Zheng, L., Cao, D., & Li, S. (2017). Re-ranking person re-identification with k-reciprocal encoding. In CVPR (pp. 1318–1327).

  • Zhou, K., Yang, Y., Cavallaro, A., & Xiang, T. (2019). Omni-scale feature learning for person re-identification. In ICCV (pp. 3702–3712).

  • Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Learning to prompt for vision-language models. IJCV, 130(9), 2337–2348.

    Google Scholar 

  • Zhou, M., Liu, H., Lv, Z., Hong, W., & Chen, X. (2022b). Motion-aware transformer for occluded person re-identification. arXiv preprint arXiv:2202.04243

  • Zhu, A., Wang, Z., Li, Y., Wan, X., Jin, J., Wang, T., Hu, F., & Hua, G. (2021a). Dssl: Deep surroundings-person separation learning for text-based person retrieval. In ACM MM (pp. 209–217).

  • Zhu, H., Ke, W., Li, D., Liu, J., Tian, L., & Shan, Y. (2022a). Dual cross-attention learning for fine-grained visual categorization and object re-identification. In CVPR (pp. 4692–4702).

  • Zhu, K., Guo, H., Zhang, S., Wang, Y., Huang, G., Qiao, H., Liu, J., Wang, J., & Tang, M. (2021b). Aaformer: Auto-aligned transformer for person re-identification. arXiv preprint arXiv:2104.00921

  • Zhu, K., Guo, H., Yan, T., Zhu, Y., Wang, J., & Tang, M. (2022). Pass: Part-aware self-supervised pre-training for person re-identification. ECCV (pp. 198–214). Cham: Springer.

  • Zhuo, J., Chen, Z., Lai, J., & Wang, G. (2018). Occluded person re-identification. In ICME (pp. 1–6). IEEE.

  • Zuerl, M., Dirauf, R., Koeferl, F., Steinlein, N., Sueskind, J., Zanca, D., Brehm, I., Lv, Fersen, & Eskofier, B. (2023). Polarbearvidid: A video-based re-identification benchmark dataset for polar bears. Animals, 13(5), 801.

    Google Scholar 

  • Zuo, J., Yu, C., Sang, N., Gao, & C. (2023). Plip: Language-image pre-training for person representation learning. arXiv preprint arXiv:2305.08386

  • Zuo, J., Zhou, H., Nie, Y., Zhang, F., Guo, T., Sang, N., Wang, Y., & Gao, C. (2024). Ufinebench: Towards text-based person retrieval with ultra-fine granularity. In CVPR (pp. 22010–22019).

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (62176188, 62361166629, 62225113).

Funding

National Natural Science Foundation of China (62176188, 62361166629, 62225113).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mang Ye.

Additional information

Communicated by Bumsub Ham.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, M., Chen, S., Li, C. et al. Transformer for Object Re-identification: A Survey. Int J Comput Vis 133, 2410–2440 (2025). https://doi.org/10.1007/s11263-024-02284-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-024-02284-4

Keywords