Transformer for Object Re-identification: A Survey

Ye, Mang; Chen, Shuoyi; Li, Chenyue; Zheng, Wei-Shi; Crandall, David; Du, Bo

doi:10.1007/s11263-024-02284-4

Transformer for Object Re-identification: A Survey

Published: 23 November 2024

Volume 133, pages 2410–2440, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Mang Ye ORCID: orcid.org/0000-0003-3989-7655¹,
Shuoyi Chen¹,
Chenyue Li¹,
Wei-Shi Zheng²,
David Crandall³ &
…
Bo Du¹

2980 Accesses
27 Citations
Explore all metrics

Abstract

Object Re-identification (Re-ID) aims to identify specific objects across different times and scenes, which is a widely researched task in computer vision. For a prolonged period, this field has been predominantly driven by deep learning technology based on convolutional neural networks. In recent years, the emergence of Vision Transformers has spurred a growing number of studies delving deeper into Transformer-based Re-ID, continuously breaking performance records and witnessing significant progress in the Re-ID field. Offering a powerful, flexible, and unified solution, Transformers cater to a wide array of Re-ID tasks with unparalleled efficacy. This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID. In categorizing existing works into Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages demonstrated by the Transformer in addressing a multitude of challenges across these domains. Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID, achieving state-of-the-art performance on both single/cross modal tasks. For the under-explored animal Re-ID, we devise a standardized experimental benchmark and conduct extensive experiments to explore the applicability of Transformer for this task and facilitate future research. Finally, we discuss some important yet under-investigated open issues in the large foundation model era, we believe it will serve as a new handbook for researchers in this field. A periodically updated website will be available at https://github.com/mangye16/ReID-Survey.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Learning convolutional multi-level transformers for image-based person re-identification

Article Open access 13 October 2023

Improving Deep Models of Person Re-identification for Cross-Dataset Usage

Deep Learning-Driven Person Re-identification: Leveraging Color Space Transformations

Data Availability

The authors declare that publicly available datasets are utilized for evaluating object Re-ID methods. The data supporting the experiments conducted in this study can be found in the paper. All publicly available datasets about person-related data used in this study are obtained and utilized following appropriate permissions and ethical guidelines.

References

(2022). Beluga id 2022. https://lila.science/datasets/beluga-id-2022/
(2022). Hyena id 2022. https://lila.science/datasets/hyena-id-2022/
(2022). Leopard id 2022. https://lila.science/datasets/leopard-id-2022/
Ahmed, E., Jones, M., & Marks, T. K. (2015). An improved deep learning architecture for person re-identification. In CVPR (pp. 3908–3916).
Bai. Y., Jiao, J., Ce, W., Liu, J., Lou, Y., Feng, X., & Duan, L. Y. (2021a). Person30k: A dual-meta generalization network for person re-identification. In CVPR (pp. 2123–2132).
Bai, Z., Wang, Z., Wang, J., Hu, D., & Ding, E. (2021b). Unsupervised multi-source domain adaptation for person re-identification. In CVPR (pp. 12914–12923).
Bergamini, L., Porrello, A., Dondona, A. C., Del Negro, E., Mattioli, M., D’alterio, N., & Calderara, S. (2018). Multi-views embedding for cattle re-identification. In IEEE SITIS (pp. 184–191).
Bouma, S., Pawley, M. D., Hupman, K., & Gilman, A. (2018). Individual common dolphin identification via metric embedding learning. In IEEE IVCNZ (pp. 1–6).
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. NeurIPS, 33, 1877–1901.
Google Scholar
Bruslund Haurum, J., Karpova, A., Pedersen, M., Hein Bengtson, S., & Moeslund, T. B. (2020). Re-identification of zebrafish using metric learning. In WACV workshop (pp. 1–11).
Cao, J., Pang, Y., Anwer, R. M., Cholakkal, H., Xie, J., Shah, M., & Khan, F. S. (2022). Pstr: End-to-end one-step person search with transformers. In CVPR (pp. 9458–9467).
Cao, M., Bai, Y., Zeng, Z., Ye, M., & Zhang, M. (2024). An empirical study of clip for text-based person search. AAAI, 38, 465–473.
Google Scholar
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., & Joulin, A. (2021). Emerging properties in self-supervised vision transformers. In ICCV (pp. 9650–9660).
Chan, J., Carrión, H., Mégret, R., Agosto-Rivera, J. L., & Giray, T. (2022). Honeybee re-identification in video: New datasets and impact of self-supervision. In VISIGRAPP (5: VISAPP) (pp. 517–525).
Cheeseman, T., Southerland, K., Park, J., Olio, M., Flynn, K., Calambokidis, J., Jones, L., Garrigue, C., Frisch Jordan, A., Howard, A., et al. (2022). Advanced image recognition: A fully automated, high-accuracy photo-identification matching system for humpback whales. Mammalian Biology, 102(3), 915–929.
Google Scholar
Chen, B., Deng, W., & Hu, J. (2019). Mixed high-order attention network for person re-identification. In ICCV (pp 371–381).
Chen, C., Ye, M., Qi, M., & Du, B. (2022a). Sketch transformer: Asymmetrical disentanglement learning from dynamic synthesis. In ACM MM (pp. 4012–4020).
Chen, C., Ye, M., Qi, M., Wu, J., Jiang, J., & Lin, C. W. (2022). Structure-aware positional transformer for visible-infrared person re-identification. IEEE TIP, 31, 2352–2364.
Google Scholar
Chen, C., Ye, M., & Jiang, D. (2023a). Towards modality-agnostic person re-identification with descriptive query. In CVPR (pp. 15128–15137).
Chen, H., Lagadec, B., & Bremond, F. (2021a). Ice: Inter-instance contrastive encoding for unsupervised person re-identification. In ICCV (pp. 14960–14969).
Chen, S., Ye, M., & Du, B. (2022c). Rotation invariant transformer for recognizing object in UAVs. In ACM MM (pp. 2565–2574).
Chen, W., Xu, X., Jia, J., Luo, H., Wang, Y., Wang, F., Jin, R., & Sun, X. (2023b). Beyond appearance: A semantic controllable self-supervised learning framework for human-centric visual tasks. In CVPR (pp. 15050–15061).
Chen, X., Xu, C., Cao, Q., Xu, J., Zhong, Y., Xu, J., Li, Z., Wang, J., & Gao, S. (2021b). Oh-former: Omni-relational high-order transformer for person re-identification. arXiv preprint arXiv:2109.11159
Chen, Y. C., Zhu, X., Zheng, W. S., & Lai, J. H. (2017). Person re-identification by camera correlation aware feature augmentation. IEEE TPAMI, 40(2), 392–408.
Google Scholar
Cheng, D., Zhou, J., Wang, N., & Gao, X. (2022). Hybrid dynamic contrast and probability distillation for unsupervised person re-id. IEEE TIP, 31, 3334–3346.
Google Scholar
Cheng, D., Huang, X., Wang, N., He, L., Li, Z., & Gao, X. (2023a). Unsupervised visible-infrared person reid by collaborative learning with neighbor-guided label refinement. In ACM MM (pp. 7085–7093).
Cheng, D., Wang, G., Wang, B., Zhang, Q., Han, J., & Zhang, D. (2023). Hybrid routing transformer for zero-shot learning. Pattern Recognition, 137, 109270.
Google Scholar
Cheng, D., Wang, G., Wang, N., Zhang, D., Zhang, Q., & Gao, X. (2023). Discriminative and robust attribute alignment for zero-shot learning. IEEE TCSVT, 33(8), 4244–4256.
Google Scholar
Cheng, D., Li, Y., Zhang, D., Wang, N., Sun, J., & Gao, X. (2024). Progressive negative enhancing contrastive learning for image dehazing and beyond. In IEEE TMM.
Cheng, X., Jia, M., Wang, Q., & Zhang, J. (2022b). More is better: Multi-source dynamic parsing attention for occluded person re-identification. In ACM MM (pp. 6840–6849).
Cho, Y., Kim, W. J., Hong, S., & Yoon, S. E. (2022). Part-based pseudo label refinement for unsupervised person re-identification. In CVPR (pp. 7308–7318).
Choi, S., Kim, T., Jeong, M., Park, H., & Kim, C. (2021). Meta batch-instance normalization for generalizable person re-identification. In CVPR (pp. 3425–3435).
Ci, Y., Wang, Y., Chen, M., Tang, S., Bai, L., Zhu, F., Zhao, R., Yu, F., Qi, D., & Ouyang, W. (2023). Unihcp: A unified model for human-centric perceptions. In CVPR (pp. 17840–17852).
Comandur, B. (2022). Sports re-id: Improving re-identification of players in broadcast videos of team sports. arXiv preprint arXiv:2206.02373
Dai, Y., Liu, J., Sun, Y., Tong, Z., Zhang, C., & Duan, L. Y. (2021). Idm: An intermediate domain module for domain adaptive person re-id. In ICCV (pp. 11864–11874).
Dai, Z., Wang, G., Yuan, W., Zhu, S., & Tan, P. (2022). Cluster contrast for unsupervised person re-identification. In ACCV (pp. 1142–1160).
Dehghani, M., Djolonga, J., Mustafa, B., Padlewski, P., Heek, J., Gilmer, J., Steiner, A. P., Caron, M., Geirhos, R., & Alabdulmohsin, I., et al. (2023). Scaling vision transformers to 22 billion parameters. In ICML (pp. 7480–7512). PMLR.
Deng, W., Zheng, L., Ye, Q., Kang, G., Yang, Y., & Jiao, J. (2018). Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In CVPR (pp. 994–1003).
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Ding, N., Qin, Y., Yang, G., Wei, F., Yang, Z., Su, Y., Hu, S., Chen, Y., Chan, C. M., Chen, W., et al. (2023). Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3), 220–235.
Google Scholar
Ding, Z., Ding, C., Shao, Z., & Tao, D. (2021). Semantically self-aligned network for text-to-image part-aware person re-identification. arXiv preprint arXiv:2107.12666
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR.
Fan, L., Li, T., Fang, R., Hristov, R., Yuan, Y., & Katabi, D. (2020). Learning longterm representations for person re-identification using radio signals. In CVPR (pp. 10699–10709).
Farooq, A., Awais, M., Kittler, J., & Khalid, S. S. (2022). Axm-net: Implicit cross-modal feature alignment for person re-identification. AAAI, 36, 4477–4485.
Google Scholar
Feng, Y., Yu, J., Chen, F., Ji, Y., Wu, F., Liu, S., & Jing, X. Y. (2022). Visible-infrared person re-identification via cross-modality interaction transformer. In IEEE TMM.
Ferdous, S. N., Li, X., & Lyu, S. (2022). Uncertainty aware multitask pyramid vision transformer for uav-based object re-identification. In ICIP (pp. 2381–2385). IEEE.
Fu, D., Chen, D., Bao, J., Yang, H., Yuan, L., Zhang, L., Li, H., & Chen, D. (2021). Unsupervised pre-training for person re-identification. In CVPR (pp. 14750–14759).
Gao, J., Burghardt, T., Andrew, W., Dowsey, A. W., & Campbell, N. W. (2021). Towards self-supervision for video identification of individual holstein-friesian cattle: The cows2021 dataset. arXiv preprint arXiv:2105.01938
Ge, Y., Zhu, F., Chen, D., Zhao, R., et al. (2020). Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. NeurIPS, 33, 11309–11321.
Google Scholar
Gray, D., Brennan, S., & Tao, H. (2007). Evaluating appearance models for recognition, reacquisition, and tracking. PETS, 3, 1–7.
Google Scholar
Gu, J., Luo, H., Wang, K., Jiang, W., You, Y., & Zhao, J. (2023). Color prompting for data-free continual unsupervised domain adaptive person re-identification. arXiv preprint arXiv:2308.10716
Guo, H., Zhu, K., Tang, M., & Wang, J. (2019). Two-level attention network with multi-grain ranking loss for vehicle re-identification. IEEE TIP, 28(9), 4328–4338.
MathSciNet Google Scholar
Guo, P., Liu, H., Wu, J., Wang, G., & Wang, T. (2023). Semantic-aware consistency network for cloth-changing person re-identification. arXiv preprint arXiv:2308.14113
Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y., et al. (2022). A survey on vision transformer. IEEE TPAMI, 45(1), 87–110.
Google Scholar
Han, X., He, S., Zhang, L., & Xiang, T. (2021). Text-based person search with limited data. arXiv:2110.10807
He, B., Li, J., Zhao, Y., & Tian, Y. (2019). Part-regularized near-duplicate vehicle re-identification. In CVPR (pp. 3997–4005).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. In CVPR (pp. 16000–16009).
He, S., Luo, H., Wang, P., Wang, F., Li, H., & Jiang, W. (2021a). Transreid: Transformer-based object re-identification. In ICCV (pp. 15013–15022).
He, S., Chen, W., Wang, K., Luo, H., Wang, F., Jiang, W., & Ding, H. (2023a). Region generation and assessment network for occluded person re-identification. In IEEE TIFS.
He, S., Luo, H., Jiang, W., Jiang, X., & Ding, H. (2023). Vgsg: Vision-guided semantic-group network for text-based person search. IEEE TIP, 33, 163–176.
Google Scholar
He, T., Jin, X., Shen, X., Huang, J., Chen, Z., Hua, X. S. (2021b). Dense interaction learning for video-based person re-identification. In ICCV (pp. 1490–1501).
He, T., Shen, X., Huang, J., Chen, Z., & Hua, X. S. (2021c). Partial person re-identification with part-part correspondence learning. In CVPR (pp. 9105–9115).
He, W., Deng, Y., Tang, S., Chen, Q., Xie, Q., Wang, Y., Bai, L., Zhu, F., Zhao, R., & Ouyang, W., et al. (2024). Instruct-reid: A multi-purpose person re-identification task with instructions. In CVPR (pp. 17521–17531).
Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737
Hong, P., Wu, T., Wu, A., Han, X., & Zheng, W. S. (2021). Fine-grained shape-appearance mutual learning for cloth-changing person re-identification. In CVPR (pp. 10513–10522).
Howard, A., Ken, I., Southerland Holbrook. R., & Cheeseman, T. (2022). Happywhale - whale and dolphin identification. https://kaggle.com/competitions/happy-whale-and-dolphin
Jia, M., Cheng, X., Lu, S., & Zhang, J. (2022). Learning disentangled representation implicitly via transformer for occluded person re-identification. IEEE TMM, 25, 1294–1305.
Google Scholar
Jia, X., Zhong, X., Ye, M., Liu, W., & Huang, W. (2022). Complementary data augmentation for cloth-changing person re-identification. IEEE TIP, 31, 4227–4239.
Google Scholar
Jiang, D., & Ye, M. (2023). Cross-modal implicit relation reasoning and aligning for text-to-image person retrieval. In CVPR (pp. 2787–2797).
Jiang, K., Zhang, T., Liu, X., Qian, B., Zhang, Y., & Wu, F. (2022). Cross-modality transformer for visible-infrared person re-identification. In ECCV (pp. 480–496). Springer.
Jiao, B., Liu, L., Gao, L., Wu, R., Lin, G., Wang, P., & Zhang, Y. (2023). Toward re-identifying any animal. In NeurIPS.
Jin, X., Lan, C., Zeng, W., Chen, Z., & Zhang, L. (2020). Style normalization and restitution for generalizable person re-identification. In CVPR (pp. 3143–3152).
Jin, X., He, T., Zheng, K., Yin, Z., Shen, X., Huang, Z., Feng, R., Huang, J., Chen, Z., & Hua, X. S. (2022). Cloth-changing person re-identification from a single image with gait prediction and regularization. In CVPR (pp. 14278–14287).
Kalayeh, M. M., Basaran, E., Gökmen, M., Kamasak, M. E., & Shah, M. (2018). Human semantic parsing for person re-identification. In CVPR (pp. 1062–1071).
Khan, S. D., & Ullah, H. (2019). A survey of advances in vision-based vehicle re-identification. CVIU, 182, 50–63.
Google Scholar
Khorramshahi, P., Kumar, A., Peri, N., Rambhatla, S. S., Chen, J. C., & Chellappa, R. (2019). A dual-path model with adaptive attention for vehicle re-identification. In ICCV (pp. 6132–6141).
Koch, G., Zemel, R., & Salakhutdinov, R., et al. (2015). Siamese neural networks for one-shot image recognition. In ICML workshop (vol. 2). Lille.
Konovalov, D. A., Hillcoat, S., Williams, G., Birtles, R. A., Gardiner, N., & Curnock, M. I. (2018). Individual minke whale recognition using deep learning convolutional neural networks. Journal of Geoscience and Environment Protection, 6, 25–36.
Google Scholar
Korschens, M., & Denzler, J. (2019). Elpephants: A fine-grained dataset for elephant re-identification. In ICCV workshop.
Kumar, S., Yaghoubi, E., Das, A., Harish, B., & Proença, H. (2020). The p-destre: A fully annotated dataset for pedestrian detection, tracking, re-identification and search from aerial devices. arXiv preprint arXiv:2004.02782
Kuncheva, L. I., Williams, F., Hennessey, S. L., & Rodríguez, J. J. (2022). A benchmark database for animal re-identification and tracking. In IEEE IPAS (pp. 1–6). IEEE.
Lai, S., Chai, Z., & Wei, X. (2021). Transformer meets part model: Adaptive part division for person re-identification. In ICCV (pp. 4150–4157).
Lee, K. W., Jawade, B., Mohan, D., Setlur, S., & Govindaraju, V. (2022). Attribute de-biased vision transformer (ad-vit) for long-term person re-identification. In IEEE AVSS (pp. 1–8) . IEEE.
Li, H., Li, C., Zhu, X., Zheng, A., & Luo, B. (2020). Multi-spectral vehicle re-identification: A challenge. AAAI, 34, 11345–11353.
Google Scholar
Li, H., Wu, G., & Zheng, W. S. (2021a). Combined depth space based architecture search for person re-identification. In CVPR (pp. 6729–6738).
Li, H., Ye, M., & Du, B. (2021b). Weperson: Learning a generalized re-identification model from all-weather virtual data. In ACM MM (pp. 3115–3123).
Li, H., Li, C., Zheng, A., Tang, J., & Luo, B. (2022). Mskat: Multi-scale knowledge-aware transformer for vehicle re-identification. IEEE TITS, 23(10), 19557–19568.
Google Scholar
Li, H., Ye, M., Wang, C., & Du, B. (2022b). Pyramidal transformer with conv-patchify for person re-identification. In ACM MM (pp. 7317–7326).
Li, H., Ye, M., Zhang, M., Du, B. (2024a). All in one framework for multimodal re-identification in the wild. In CVPR (pp. 17459–17469).
Li, M., Zhu, X., & Gong, S. (2019). Unsupervised tracklet person re-identification. IEEE TPAMI, 42(7), 1770–1782.
Google Scholar
Li, S., Xiao, T., Li, H., Zhou, B., Yue, D., & Wang, X. (2017). Person search with natural language description. In CVPR (pp. 1970–1979).
Li, S., Li, J., Tang, H., Qian, R., & Lin, W. (2019b). Atrw: A benchmark for amur tiger re-identification in the wild. arXiv preprint arXiv:1906.05586
Li, S., Fu, L., Sun, Y., Mu, Y., Chen, L., Li, J., & Gong, H. (2021). Individual dairy cow identification based on lightweight convolutional neural network. Plos one, 16(11), e0260510.
Google Scholar
Li, S., Sun, L., & Li, Q. (2023). Clip-reid: Exploiting vision-language model for image re-identification without concrete text labels. AAAI, 37, 1405–1413.
Google Scholar
Li, T., Liu, J., Zhang, W., Ni, Y., Wang, W., & Li, Z. (2021d). Uav-human: A large benchmark for human behavior understanding with unmanned aerial vehicles. In CVPR (pp. 16266–16275).
Li, W., Zhao, R., Xiao, T., & Wang, X. (2014). Deepreid: Deep filter pairing neural network for person re-identification. In CVPR (pp. 152–159).
Li, W., Zhu, X., & Gong, S. (2018). Harmonious attention network for person re-identification. In CVPR (pp. 2285–2294).
Li, W., Zou, C., Wang, M., Xu, F., Zhao, J., Zheng, R., Cheng, Y., & Chu, W. (2023b). Dc-former: Diverse and compact transformer for person re-identification. arXiv preprint arXiv:2302.14335
Li, Y., He, J., Zhang, T., Liu, X., Zhang, Y., & Wu, F. (2021e). Diverse part discovery: Occluded person re-identification with part-aware transformer. In CVPR (pp. 2898–2907).
Li, Y., Liu, Y., Zhang, H., Zhao, C., Wei, Z., & Miao, D. (2024b). Occlusion-aware transformer with second-order attention for person re-identification. IEEE TIP
Liang, T., Jin, Y., Liu, W., & Li, Y. (2023). Cross-modality transformer with modality mining for visible-infrared person re-identification. IEEE TMM
Liao, S., & Shao, L. (2021). Transmatcher: Deep image matching through transformers for generalizable person re-identification. NeurIPS, 34, 1992–2003.
Google Scholar
Liao, S., Hu, Y., Zhu, X., & Li, S. Z. (2015). Person re-identification by local maximal occurrence representation and metric learning. In CVPR (pp. 2197–2206).
Lin, W., Li, Y., Xiao, H., See, J., Zou, J., Xiong, H., Wang, J., & Mei, T. (2019). Group reidentification with multigrained matching and integration. IEEE transactions on cybernetics, 51(3), 1478–1492.
Google Scholar
Lin, Y., Dong, X., Zheng, L., Yan, Y., & Yang, Y. (2019). A bottom-up clustering approach to unsupervised person re-identification. AAAI, 33, 8738–8745.
Google Scholar
Lin, Y., Xie, L., Wu, Y., Yan, C., & Tian, Q. (2020). Unsupervised person re-identification via softened similarity learning. In CVPR (pp. 3390–3399).
Liu, F., Ye, M., & Du, B. (2023a). Dual level adaptive weighting for cloth-changing person re-identification. IEEE TIP
Liu, H., Jie, Z., Jayashree, K., Qi, M., Jiang, J., Yan, S., & Feng, J. (2017). Video-based person re-identification with accumulative motion context. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2788–2802.
Google Scholar
Liu, X., Liu, W., Ma, H., & Fu, H. (2016a). Large-scale vehicle re-identification in urban surveillance videos. In ICME (pp. 1–6). IEEE.
Liu, X., Liu, W., Mei, T., & Ma, H. (2016b). A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In ECCV (pp. 869–884). Springer.
Liu, X., Zhang, P., Yu, C., Lu, H., Qian, X., & Yang, X. (2021a). A video is worth three views: Trigeminal transformers for video-based person re-identification. arXiv preprint arXiv:2104.01745
Liu, X., Yu, C., Zhang, P., & Lu, H. (2023b). Deeply coupled convolution–transformer with spatial–temporal complementary learning for video-based person re-identification. In IEEE TNNLS.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021b). Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030
Lou, Y., Bai, Y., Liu, J., Wang, S., & Duan, L. (2019). Veri-wild: A large dataset and a new method for vehicle re-identification in the wild. In CVPR (pp. 3235–3243).
Luo, H., Jiang, W., Gu, Y., Liu, F., Liao, X., Lai, S., & Gu, J. (2019). A strong baseline and batch normalization neck for deep person re-identification. IEEE TMM, 22(10), 2597–2609.
Google Scholar
Luo, H., Wang, P., Xu, Y., Ding, F., Zhou, Y., Wang, F., Li, H., & Jin, R. (2021). Self-supervised pre-training for transformer-based person re-identification. arXiv preprint arXiv:2111.12084
Mallat, S. G. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE TPAMI, 11(7), 674–693.
Google Scholar
Mao, J., Yao, Y., Sun, Z., Huang, X., Shen, F., & Shen, H. T. (2023). Attention map guided transformer pruning for occluded person re-identification on edge device. In IEEE TMM.
McLaughlin, N., Del Rincon, J. M., & Miller, P. (2016). Recurrent convolutional network for video-based person re-identification. In CVPR (pp. 1325–1334).
Meng, D., Li, L., Liu, X., Li, Y., Yang, S., Zha, Z. J., Gao, X., Wang, S., Huang, Q. (2020). Parsing-based view-aware embedding network for vehicle re-identification. In CVPR (pp. 7103–7112).
Miao, J., Wu, Y., Liu, P., Ding, Y., & Yang, Y. (2019). Pose-guided feature alignment for occluded person re-identification. In ICCV (pp. 542–551).
Moskvyak, O., Maire, F., Dayoub, F., & Baktashmotlagh, M. (2020). Learning landmark guided embeddings for animal re-identification. In WACV workshop (pp. 12–19).
Moskvyak, O., Maire, F., Dayoub, F., Armstrong, A. O., & Baktashmotlagh, M. (2021). Robust re-identification of manta rays from natural markings by learning pose invariant embeddings. In DICTA (pp. 1–8). IEEE.
Naseer, M., Ranasinghe, K., Khan, S., Hayat, M., Khan, F. S., & Yang, M. H. (2021). Intriguing properties of vision transformers. arXiv preprint arXiv:2105.10497
Nepovinnykh, E., Eerola, T., & Kalviainen, H. (2020). Siamese network based pelage pattern matching for ringed seal re-identification. In WACV workshop (pp. 25–34).
Nepovinnykh, E., Eerola, T., Biard, V., Mutka, P., Niemi, M., Kunnasranta, M., & Kälviäinen, H. (2022). Sealid: Saimaa ringed seal re-identification dataset. Sensors, 22(19), 7602.
Google Scholar
Nguyen, D. T., Hong, H. G., Kim, K. W., & Park, K. R. (2017). Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors, 17(3), 605.
Google Scholar
Ni, H., Song, J., Luo, X., Zheng, F., Li, W., & Shen, H. T. (2022). Meta distribution alignment for generalizable person re-identification. In CVPR (pp. 2487–2496).
Ni, H., Li, Y., Gao, L., Shen, H. T., & Song, J. (2023). Part-aware transformer for generalizable person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 11280–11289).
Niu, K., Huang, Y., Ouyang, W., & Wang, L. (2020). Improving description-based person re-identification by multi-granularity image-text alignments. In IEEE TIP (pp. 5542–5556).
Organisciak, D., Poyser, M., Alsehaim, A., Hu, S., Isaac-Medina, B. K., Breckon, T. P., Shum, H. P. (2021). Uav-reid: A benchmark on unmanned aerial vehicle re-identification in video imagery. arXiv preprint arXiv:2104.06219
Pang, L., Wang, Y., Song, Y. Z., Huang, T., Tian, Y. (2018). Cross-domain adversarial feature learning for sketch re-identification. In ACM MM (pp. 609–617).
Papafitsoros, K., Adam, L., Čermák, V., & Picek, L. (2022). Seaturtleid: A novel long-span dataset highlighting the importance of timestamps in wildlife re-identification. arXiv preprint arXiv:2211.10307
Parham, J., Crall, J., Stewart, C., Berger-Wolf, T., Rubenstein, D. I. (2017). Animal population censusing at scale with citizen science and photographic identification. In AAAI.
Park, H., & Ham, B. (2020). Relation network for person re-identification. AAAI, 34, 11839–11847.
Google Scholar
Porrello, A., Bergamini, L., & Calderara, S. (2020). Robust re-identification by multiple views knowledge distillation. In ECCV (pp. 93–110). Springer.
Pu, N., Zhong, Z., Sebe, N., Lew, M. S. (2023). A memorizing and generalizing framework for lifelong person re-identification. In IEEE TPAMI
Qian, W., Luo, H., Peng, S., Wang, F., Chen, C., & Li, H. (2022). Unstructured feature decoupling for vehicle re-identification. In ECCV (pp. 336–353).
Qian, X., Wang, W., Zhang, L., Zhu, F., Fu, Y., Xiang, T., Jiang, Y. G., & Xue, X. (2020). Long-term cloth-changing person re-identification. In ACCV.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In ICML (pp. 8748–8763). PMLR.
Rao, H., & Miao, C. (2023). Transg: Transformer-based skeleton graph prototype contrastive learning with structure-trajectory prompted reconstruction for person re-identification. In CVPR (pp. 22118–22128).
Rao, H., Wang, S., Hu, X., Tan, M., Guo, Y., Cheng, J., Liu, X., & Hu, B. (2021). A self-supervised gait encoding approach with locality-awareness for 3d skeleton based person re-identification. IEEE TPAMI, 44(10), 6649–6666.
Google Scholar
Rao, H., Leung, C., & Miao, C. (2024). Hierarchical skeleton meta-prototype contrastive learning with hard skeleton mining for unsupervised person re-identification. IJCV, 132(1), 238–260.
Google Scholar
Sarafianos, N., Xu, X., & Kakadiaris, I. A. (2019). Adversarial representation learning for text-to-image matching. In ICCV (pp. 5814–5824).
Schneider, S., Taylor, G. W., Linquist, S., & Kremer, S. C. (2019). Past, present and future approaches using computer vision for animal re-identification from camera trap data. Methods in Ecology and Evolution, 10(4), 461–470.
Google Scholar
Shao, Z., Zhang, X., Fang, M., Lin, Z., Wang, J., & Ding, C. (2022). Learning granularity-unified representations for text-to-image person re-identification. In ACM MM (pp. 5566–5574).
Shao, Z., Zhang, X., Ding, C., Wang, J., & Wang, J. (2023). Unified pre-training with pseudo texts for text-to-image person re-identification. In ICCV (pp. 11174–11184).
Shen, F., Xie, Y., Zhu, J., Zhu, X., & Zeng, H. (2023). Git: Graph interactive transformer for vehicle re-identification. IEEE TIP, 32, 1039–1051.
Google Scholar
Shen, L., He, T., Guo, Y., & Ding, G. (2023b). X-reid: Cross-instance transformer for identity-level person re-identification. arXiv preprint arXiv:2302.02075
Shu, X., Wen, W., Wu, H., Chen, K., Song, Y., Qiao, R., Ren, B., & Wang, X. (2022). See finer, see more: Implicit modality alignment for text-based person retrieval. In ECCV (pp. 624–641). Springer.
Song, G., Leng, B., Liu, Y., Hetang, C., & Cai, S. (2018). Region-based quality estimation network for large-scale person re-identification. In AAAI (vol. 32).
Su, C., Li, J., Zhang, S., Xing, J., Gao, W., & Tian, Q. (2017). Pose-driven deep convolutional model for person re-identification. In ICCV (pp. 3960–3969).
Suh, Y., Wang, J., Tang, S., Mei, T., & Lee, K. M. (2018). Part-aligned bilinear representations for person re-identification. In ECCV (pp. 402–419).
Sun, C. C., Arr, G. S., Ramachandran, R. P., & Ritchie, S. G. (2004). Vehicle reidentification using multidetector fusion. IEEE TITS, 5(3), 155–164.
Google Scholar
Sun, X., & Zheng, L. (2019). Dissecting person re-identification from the viewpoint of viewpoint. In CVPR (pp. 608–617).
Sun, Y., Zheng, L., Yang, Y., Tian, Q., & Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In ECCV (pp. 480–496)
Tan, B., Xu, L., Qiu, Z., Wu, Q., & Meng, F. (2023). Mfat: A multi-level feature aggregated transformer for person re-identification. In ICASSP (pp. 1–5). IEEE.
Tan, W., Ding, C., Jiang, J., Wang, F., Zhan, Y., & Tao, D. (2024). Harnessing the power of mllms for transferable text-to-image person reid. In CVPR (pp. 17127–17137).
Tang, S., Chen, C., Xie, Q., Chen, M., Wang, Y., Ci, Y., Bai, L., Zhu, F., Yang, H., & Yi, L., et al. (2023). Humanbench: Towards general human-centric perception with projector assisted pretraining. In CVPR (pp. 21970–21982).
Tang, Z., Naphade, M., Liu, M. Y., Yang, X., Birchfield, S., Wang, S., Kumar, R., Anastasiu, D., & Hwang, J. N. (2019). Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. In CVPR (pp. 8797–8806).
Tang, Z., Zhang, R., Peng, Z., Chen, J., & Lin, L. (2022). Multi-stage spatio-temporal aggregation transformer for video person re-identification. In IEEE TMM.
Teng, S., Zhang, S., Huang, Q., & Sebe, N. (2021). Viewpoint and scale consistency reinforcement for uav vehicle re-identification. IJCV, 129, 719–735.
Google Scholar
Tian, X., Liu, J., Zhang, Z., Wang, C., Qu, Y., Xie, Y., & Ma, L. (2022). Hierarchical walking transformer for object re-identification. In ACM MM (pp. 4224–4232).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. NeurIPS 30
Walmer, M., Suri, S., Gupta, K., & Shrivastava, A. (2023). Teaching matters: Investigating the role of supervision in vision transformers. In CVPR (pp. 7486–7496).
Wang, D., & Zhang, S. (2020). Unsupervised person re-identification via multi-label classification. In CVPR (pp. 10981–10990).
Wang, G., Zhang, T., Cheng, J., Liu, S., Yang, Y., & Hou, Z. (2019a). Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment. In ICCV (pp. 3623–3632).
Wang, G., Yang, S., Liu, H., Wang, Z., Yang, Y., Wang, S., Yu, G., Zhou, E., & Sun, J. (2020a). High-order information matters: Learning relation and topology for occluded person re-identification. In CVPR (pp. 6449–6458).
Wang, G., Yu, F., Li, J., Jia, Q., & Ding, S. (2023a). Exploiting the textual potential from vision-language pre-training for text-based person search. arXiv preprint arXiv:2303.04497
Wang, G. A., Zhang, T., Yang, Y., Cheng, J., Chang, J., Liang, X., & Hou, Z. G. (2020). Cross-modality paired-images generation for rgb-infrared person re-identification. AAAI, 34, 12144–12151.
Google Scholar
Wang, H., Shen, J., Liu, Y., Gao, Y., & Gavves, E. (2022a). Nformer: Robust person re-identification with neighbor transformer. In CVPR (pp. 7297–7307).
Wang, J., Zhang, Z., Chen, M., Zhang, Y., Wang, C., Sheng, B., Qu, Y., & Xie, Y. (2022b). Optimal transport for label-efficient visible-infrared person re-identification. In ECCV (pp. 93–109). Springer.
Wang, L., Ding, R., Zhai, Y., Zhang, Q., Tang, W., Zheng, N., & Hua, G. (2021). Giant panda identification. IEEE TIP, 30, 2837–2849.
Google Scholar
Wang, P., Jiao, B., Yang, L., Yang, Y., Zhang, S., Wei, W., & Zhang, Y. (2019b). Vehicle re-identification in aerial imagery: Dataset and approach. In ICCV (pp. 460–469).
Wang, T., Liu, H., Song, P., Guo, T., & Shi, W. (2022). Pose-guided feature disentangling for occluded person re-identification based on transformer. AAAI, 36, 2540–2549.
Google Scholar
Wang, T., Liu, H., Li, W., Ban, M., Guo, T., & Li, Y. (2023b). Feature completion transformer for occluded person re-identification. arXiv preprint arXiv:2303.01656
Wang, W., Xie, E., Li, X., Fan, D. P., Song, K., Liang, D., Lu, T., Luo, P., & Shao, L. (2021b). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122
Wang, X., Wang, X., Jiang, B., & Luo, B. (2023c). Few-shot learning meets transformer: Unified query-support transformers for few-shot classification. In IEEE TCSVT
Wang, Y., Qi, G., Li, S., Chai, Y., & Li, H. (2022). Body part-level domain alignment for domain-adaptive person re-identification with transformer framework. IEEE TIFS, 17, 3321–3334.
Google Scholar
Wang, Z., Wang, Z., Zheng, Y., Chuang, Y. Y., & Satoh, S. (2019c). Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In CVPR (pp. 618–626).
Wang, Z., Wang, Z., Zheng, Y., Wu, Y., Zeng, W., & Satoh, S. (2019d). Beyond intra-modality: A survey of heterogeneous person re-identification. arXiv preprint arXiv:1905.10048
Wang, Z., Fang, Z., Wang, J., & Yang, Y. (2020c). Vitaa: Visual-textual attributes alignment in person search by natural language. In ECCV (pp. 402–420). Springer.
Wei, L., Zhang, S., Gao, W., & Tian, Q. (2018). Person transfer gan to bridge domain gap for person re-identification. In CVPR (pp. 79–88).
Wei, R., Gu, J., He, S., & Jiang, W. (2022). Transformer-based domain-specific representation for unsupervised domain adaptive vehicle re-identification. IEEE TITS, 24(3), 2935–2946.
Google Scholar
Weideman, H., Stewart, C., Parham, J., Holmberg, J., Flynn, K., Calambokidis, J., Paul, D. B., Bedetti, A., Henley M., & Pope F., et al. (2020). Extracting identifying contours for african elephants and humpback whales using a learned appearance model. In WACV (pp. 1276–1285).
Weideman, H. J., Jablons, Z. M., Holmberg, J., Flynn, K., Calambokidis, J., Tyson, R. B., Allen, J. B., Wells, R. S., Hupman, K., & Urian K., et al. (2017). Integral curvature representation and matching algorithms for identification of dolphins and whales. In ICCV workshop (pp. 2831–2839).
Wu, A., Zheng, W. S., Yu, H. X., Gong, S., & Lai, J. (2017). Rgb-infrared cross-modality person re-identification. In ICCV (pp. 5380–5389).
Wu, J., He, L., Liu, W., Yang, Y., Lei, Z., Mei, T., & Li, S. Z. (2022a). Cavit: Contextual alignment vision transformer for video object re-identification. In ECCV (pp. 549–566). Springer.
Wu, L., Liu, D., Zhang, W., Chen, D., Ge, Z., Boussaid, F., Bennamoun, M., & Shen, J. (2022). Pseudo-pair based self-similarity learning for unsupervised person re-identification. IEEE TIP, 31, 4803–4816.
Google Scholar
Wu, P., Wang, L., Zhou, S., Hua, G., & Sun, C. (2024). Temporal correlation vision transformer for video person re-identification. AAAI, 38, 6083–6091.
Google Scholar
Wu, Y., Yan, Z., Han, X., Li, G., Zou, C., & Cui, S. (2021). Lapscore: language-guided person search via color reasoning. In ICCV (pp. 1624–1633).
Wu, Z., & Ye, M. (2023). Unsupervised visible-infrared person re-identification via progressive graph matching and alternate learning. In CVPR (pp. 9548–9558).
Xiao, H., Lin, W., Sheng, B., Lu, K., Yan, J., Wang, J., Ding, E., & Zhang, Y., Xiong, H. (2018). Group re-identification: Leveraging and integrating multi-grain information. In ACM MM (pp. 192–200).
Xiao, T., Li, S., Wang, B., Lin, L., & Wang, X. (2017). Joint detection and identification feature learning for person search. In CVPR (pp. 3415–3424).
Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., & Hu, H. (2022). Simmim: A simple framework for masked image modeling. In CVPR (pp. 9653–9663).
Xu, B., He, L., Liang, J., & Sun, Z. (2022). Learning feature recovery transformer for occluded person re-identification. IEEE TIP, 31, 4651–4662.
Google Scholar
Xu, P., Zhu, X. (2023). Deepchange: A long-term person re-identification benchmark with clothes change. In ICCV (pp. 11196–11205).
Xu, P., Zhu, X., & Clifton, D. A. (2023). Multimodal learning with transformers: A survey. In IEEE TPAMI.
Xu, W., Liu, H., Shi, W., Miao, Z., Lu, Z., & Chen, F. (2021). Adversarial feature disentanglement for long-term person re-identification. In IJCAI (pp. 1201–1207).
Xuan, S., Zhang, S. (2021). Intra-inter camera similarity for unsupervised person re-identification. In CVPR (pp. 11926–11935).
Yan, K., Tian, Y., Wang, Y., Zeng, W., & Huang, T. (2017). Exploiting multi-grain ranking constraints for precisely searching visually-similar vehicles. In ICCV (pp. 562–570).
Yan, S., Dong, N., Zhang, L., & Tang, J. (2022). Clip-driven fine-grained text-image person re-identification. arXiv preprint arXiv:2210.10276
Yan, Y., Ni, B., Song, Z., Ma, C., Yan, Y., & Yang, X. (2016). Person re-identification via recurrent feature aggregation. In ECCV (pp. 701–716). Springer
Yan, Y., Qin, J., Ni, B., Chen, J., Liu, L., Zhu, F., Zheng, W. S., Yang, X., & Shao, L. (2020). Learning multi-attention context graph for group-based re-identification. IEEE TPAMI, 45(6), 7001–7018.
Google Scholar
Yang, B., Ye, M., Chen, J., & Wu, Z. (2022). Augmented dual-contrastive aggregation learning for unsupervised visible-infrared person re-identification. In ACM MM (pp. 2843–2851).
Yang, B., Chen, J., & Ye, M. (2023a). Top-k visual tokens transformer: Selecting tokens for visible-infrared person re-identification. In ICASSP (pp. 1–5). IEEE.
Yang, B., Chen, J., Ye, M. (2023b). Towards grand unified representation learning for unsupervised visible-infrared person re-identification. In ICCV (pp. 11069–11079).
Yang, Q., Wu, A., & Zheng, W. S. (2019). Person re-identification by contour sketch under moderate clothing change. IEEE TPAMI, 43(6), 2029–2046.
Google Scholar
Yang, S., Zhou, Y., Zheng, Z., Wang, Y., Zhu, L., & Wu, Y. (2023c). Towards unified text-based person retrieval: A large-scale multi-attribute and language search benchmark. In ACM MM (pp. 4492–4501).
Yang, Z., Wu, D., Wu, C., Lin, Z., Gu, J., & Wang, W. (2024). A pedestrian is worth one prompt: Towards language guidance person re-identification. In CVPR (pp. 17343–17353)
Yao, Y., Zheng, L., Yang, X., Naphade, M., & Gedeon, T. (2020). Simulating content consistent vehicle datasets with attribute descent. In ECCV (pp. 775–791). Springer.
Ye, M., Liang, C., Wang, Z., Leng, Q., Chen, J., & Liu, J. (2015). Specific person retrieval via incomplete text description. In ACM ICMRl (pp. 547–550).
Ye, M., Lan, X., Li, J., Yuen, P. (2018). Hierarchical discriminative learning for visible thermal person re-identification. In AAAI (vol. 32).
Ye, M., Cheng, Y., Lan, X., & Zhu, H. (2019). Improving night-time pedestrian retrieval with distribution alignment and contextual distance. IEEE TII, 16(1), 615–624.
Google Scholar
Ye, M., Lan, X., Wang, Z., & Yuen, P. C. (2019). Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE TIFS, 15, 407–419.
Google Scholar
Ye, M., Shen, J., & Shao, L. (2020). Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE TIFS, 16, 728–739.
Google Scholar
Ye, M., Shen, J., Zhang, X., Yuen, P. C., & Chang, S. F. (2020b). Augmentation invariant and instance spreading feature for softmax embedding. In IEEE TPAMI.
Ye, M., Li, H., Du, B., Shen, J., Shao, L., & Hoi, S. C. (2021). Collaborative refining for person re-identification with label noise. IEEE TIP, 31, 379–391.
Google Scholar
Ye, M., Ruan, W., Du, B., & Shou, M. Z. (2021b). Channel augmented joint learning for visible-infrared recognition. In ICCV (pp. 13567–13576).
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., & Hoi, S. C. H. (2021c). Deep learning for person re-identification: A survey and outlook. In IEEE TPAMI (pp. 1–1).
Ye, M., Wu, Z., Chen, C., & Du, B. (2023). Channel augmentation for visible-infrared re-identification. IEEE TPAMI, 01, 1–16.
Google Scholar
Ye, Y., Zhou, H., Yu, J., Hu, Q., & Yang, W. (2022). Dynamic feature pruning and consolidation for occluded person re-identification. arXiv preprint arXiv:2211.14742
Yu, H. X., Zheng, W. S., Wu, A., Guo, X., Gong, S., & Lai, J. H. (2019). Unsupervised person re-identification by soft multilabel learning. In CVPR (pp. 2148–2157).
Yu, R., Du, D., LaLonde, R., Davila, D., Funk, C., Hoogs, A., & Clipp, B. (2022). Cascade transformers for end-to-end person search. In CVPR (pp. 7267–7276).
Zapletal, D., & Herout, A. (2016). Vehicle re-identification for automatic video traffic surveillance. In CVPR workshop (pp. 25–31).
Zhai, X., Kolesnikov, A., Houlsby, N., & Beyer, L. (2022a). Scaling vision transformers. In CVPR (pp. 12104–12113).
Zhai, Y., Zeng, Y., Cao, D., & Lu, S. (2022b). Trireid: Towards multi-modal person re-identification via descriptive fusion model. In ICMR (pp. 63–71).
Zhang, B., Liang, Y., & Du, M. (2022a). Interlaced perception for person re-identification based on swin transformer. In IEEE ICIVC (pp. 24–30).
Zhang, G., Zhang, P., Qi, J., & Lu, H. (2021a). Hat: Hierarchical aggregation transformers for person re-identification. In ACM MM (pp. 516–525).
Zhang, G., Zhang, Y., Zhang, T., Li, B., & Pu, S. (2023a). Pha: Patch-wise high-frequency augmentation for transformer-based person re-identification. In CVPR (pp. 14133–14142).
Zhang, Q., Lai, J. H., Feng, Z., & Xie, X. (2022). Uncertainty modeling with second-order transformer for group re-identification. AAAI, 36, 3318–3325.
Google Scholar
Zhang, Q., Wang, L., Patel, V. M., Xie, X., & Lai, J. (2024). View-decoupled transformer for person re-identification under aerial-ground camera network. In CVPR (pp. 22000–22009).
Zhang, S., Zhang, Q., Yang, Y., Wei, X., Wang, P., Jiao, B., & Zhang, Y. (2020). Person re-identification in aerial imagery. IEEE TMM, 23, 281–291.
Google Scholar
Zhang, S., Yang, Y., Wang, P., Liang, G., Zhang, X., & Zhang, Y. (2021). Attend to the difference: Cross-modality person re-identification via contrastive correlation. IEEE TIP, 30, 8861–8872.
Google Scholar
Zhang, T., Wei, L., Xie, L., Zhuang, Z., Zhang, Y., Li, B., & Tian, Q. (2021c). Spatiotemporal transformer for video-based person re-identification. arXiv preprint arXiv:2103.16469
Zhang, T., Xie, L., Wei, L., Zhuang, Z., Zhang, Y., Li, B., & Tian, Q. (2021d). Unrealperson: An adaptive pipeline towards costless person re-identification. In CVPR (pp. 11506–11515).
Zhang, T., Zhao, Q., Da, C., Zhou, L., Li, L., & Jiancuo, S. (2021e). Yakreid-103: A benchmark for yak re-identification. In IEEE IJCB (pp. 1–8). IEEE.
Zhang, X., Ge, Y., Qiao, Y., & Li, H. (2021f). Refining pseudo labels with clustering consensus over generations for unsupervised object re-identification. In CVPR (pp. 3436–3445).
Zhang, X., Li, D., Wang, Z., Wang, J., Ding, E., Shi, J. Q., Zhang, Z., & Wang, J. (2022c). Implicit sample extension for unsupervised person re-identification. In CVPR pp. 7369–7378.
Zhang, Y., & Lu, H. (2018). Deep cross-modal projection learning for image-text matching. In ECCV (pp. 686–701).
Zhang, Y., Wang, Y., Li, H., & Li, S. (2022d). Cross-compatible embedding and semantic consistent feature construction for sketch re-identification. In ACM MM (pp. 3347–3355).
Zhang, Y., Gong, K., Zhang, K., Li, H., Qiao, Y., Ouyang, W., & Yue, X. (2023b). Meta-transformer: A unified framework for multimodal learning. arXiv preprint arXiv:2307.10802
Zhang, Z., Lan, C., Zeng, W., Jin, X., & Chen, Z. (2020b). Relation-aware global attention for person re-identification. In CVPR (pp. 3186–3195).
Zhao, J., Wang, H., Zhou, Y., Yao, R., Chen, S., & El Saddik, A. (2022). Spatial-channel enhanced transformer for visible-infrared person re-identification. In IEEE TMM.
Zhao, Y., Zhong, Z., Yang, F., Luo, Z., Lin, Y., Li, S., & Sebe, N. (2021). Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification. In CVPR (pp. 6277–6286).
Zheng, K., Liu, W., He, L., Mei, T., Luo, J., & Zha, Z. J. (2021). Group-aware label transfer for domain adaptive person re-identification. In CVPR (pp. 5310–5319).
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV (pp. 1116–1124).
Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., & Tian, Q. (2016a). Mars: A video benchmark for large-scale person re-identification. In ECCV (pp. 868–884). Springer.
Zheng, L., Yang, Y., & Hauptmann, A. G. (2016b). Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984
Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., & Tian, Q. (2017a). Person re-identification in the wild. In CVPR (pp. 1367–1376).
Zheng, W., Gong, S., & Xiang, T. (2009). Associating groups of people. In BMVC (pp. 1–11).
Zheng, Z., Zheng, L., & Yang, Y. (2017). A discriminatively learned cnn embedding for person reidentification. ACM TOMM, 14(1), 1–20.
Google Scholar
Zheng, Z., Zheng, L., & Yang, Y. (2017c). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV (pp. 3754–3762).
Zhong, Z., Zheng, L., Cao, D., & Li, S. (2017). Re-ranking person re-identification with k-reciprocal encoding. In CVPR (pp. 1318–1327).
Zhou, K., Yang, Y., Cavallaro, A., & Xiang, T. (2019). Omni-scale feature learning for person re-identification. In ICCV (pp. 3702–3712).
Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Learning to prompt for vision-language models. IJCV, 130(9), 2337–2348.
Google Scholar
Zhou, M., Liu, H., Lv, Z., Hong, W., & Chen, X. (2022b). Motion-aware transformer for occluded person re-identification. arXiv preprint arXiv:2202.04243
Zhu, A., Wang, Z., Li, Y., Wan, X., Jin, J., Wang, T., Hu, F., & Hua, G. (2021a). Dssl: Deep surroundings-person separation learning for text-based person retrieval. In ACM MM (pp. 209–217).
Zhu, H., Ke, W., Li, D., Liu, J., Tian, L., & Shan, Y. (2022a). Dual cross-attention learning for fine-grained visual categorization and object re-identification. In CVPR (pp. 4692–4702).
Zhu, K., Guo, H., Zhang, S., Wang, Y., Huang, G., Qiao, H., Liu, J., Wang, J., & Tang, M. (2021b). Aaformer: Auto-aligned transformer for person re-identification. arXiv preprint arXiv:2104.00921
Zhu, K., Guo, H., Yan, T., Zhu, Y., Wang, J., & Tang, M. (2022). Pass: Part-aware self-supervised pre-training for person re-identification. ECCV (pp. 198–214). Cham: Springer.
Zhuo, J., Chen, Z., Lai, J., & Wang, G. (2018). Occluded person re-identification. In ICME (pp. 1–6). IEEE.
Zuerl, M., Dirauf, R., Koeferl, F., Steinlein, N., Sueskind, J., Zanca, D., Brehm, I., Lv, Fersen, & Eskofier, B. (2023). Polarbearvidid: A video-based re-identification benchmark dataset for polar bears. Animals, 13(5), 801.
Google Scholar
Zuo, J., Yu, C., Sang, N., Gao, & C. (2023). Plip: Language-image pre-training for person representation learning. arXiv preprint arXiv:2305.08386
Zuo, J., Zhou, H., Nie, Y., Zhang, F., Guo, T., Sang, N., Wang, Y., & Gao, C. (2024). Ufinebench: Towards text-based person retrieval with ultra-fine granularity. In CVPR (pp. 22010–22019).

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (62176188, 62361166629, 62225113).

Funding

National Natural Science Foundation of China (62176188, 62361166629, 62225113).

Author information

Authors and Affiliations

The National Engineering Research Center for Multimedia Software, School of Computer Science, Hubei Luojia Laboratory, Wuhan University, Wuhan, China
Mang Ye, Shuoyi Chen, Chenyue Li & Bo Du
The School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Wei-Shi Zheng
The Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, USA
David Crandall

Authors

Mang Ye
View author publications
Search author on:PubMed Google Scholar
Shuoyi Chen
View author publications
Search author on:PubMed Google Scholar
Chenyue Li
View author publications
Search author on:PubMed Google Scholar
Wei-Shi Zheng
View author publications
Search author on:PubMed Google Scholar
David Crandall
View author publications
Search author on:PubMed Google Scholar
Bo Du
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Mang Ye.

Additional information

Communicated by Bumsub Ham.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ye, M., Chen, S., Li, C. et al. Transformer for Object Re-identification: A Survey. Int J Comput Vis 133, 2410–2440 (2025). https://doi.org/10.1007/s11263-024-02284-4

Download citation

Received: 22 March 2024
Accepted: 22 October 2024
Published: 23 November 2024
Version of record: 23 November 2024
Issue date: May 2025
DOI: https://doi.org/10.1007/s11263-024-02284-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformer for Object Re-identification: A Survey

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning convolutional multi-level transformers for image-based person re-identification

Improving Deep Models of Person Re-identification for Cross-Dataset Usage

Deep Learning-Driven Person Re-identification: Leveraging Color Space Transformations

Explore related subjects

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now