Abstract
Object Re-identification (Re-ID) aims to identify specific objects across different times and scenes, which is a widely researched task in computer vision. For a prolonged period, this field has been predominantly driven by deep learning technology based on convolutional neural networks. In recent years, the emergence of Vision Transformers has spurred a growing number of studies delving deeper into Transformer-based Re-ID, continuously breaking performance records and witnessing significant progress in the Re-ID field. Offering a powerful, flexible, and unified solution, Transformers cater to a wide array of Re-ID tasks with unparalleled efficacy. This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID. In categorizing existing works into Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages demonstrated by the Transformer in addressing a multitude of challenges across these domains. Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID, achieving state-of-the-art performance on both single/cross modal tasks. For the under-explored animal Re-ID, we devise a standardized experimental benchmark and conduct extensive experiments to explore the applicability of Transformer for this task and facilitate future research. Finally, we discuss some important yet under-investigated open issues in the large foundation model era, we believe it will serve as a new handbook for researchers in this field. A periodically updated website will be available at https://github.com/mangye16/ReID-Survey.
Similar content being viewed by others
Data Availability
The authors declare that publicly available datasets are utilized for evaluating object Re-ID methods. The data supporting the experiments conducted in this study can be found in the paper. All publicly available datasets about person-related data used in this study are obtained and utilized following appropriate permissions and ethical guidelines.
References
(2022). Beluga id 2022. https://lila.science/datasets/beluga-id-2022/
(2022). Hyena id 2022. https://lila.science/datasets/hyena-id-2022/
(2022). Leopard id 2022. https://lila.science/datasets/leopard-id-2022/
Ahmed, E., Jones, M., & Marks, T. K. (2015). An improved deep learning architecture for person re-identification. In CVPR (pp. 3908–3916).
Bai. Y., Jiao, J., Ce, W., Liu, J., Lou, Y., Feng, X., & Duan, L. Y. (2021a). Person30k: A dual-meta generalization network for person re-identification. In CVPR (pp. 2123–2132).
Bai, Z., Wang, Z., Wang, J., Hu, D., & Ding, E. (2021b). Unsupervised multi-source domain adaptation for person re-identification. In CVPR (pp. 12914–12923).
Bergamini, L., Porrello, A., Dondona, A. C., Del Negro, E., Mattioli, M., D’alterio, N., & Calderara, S. (2018). Multi-views embedding for cattle re-identification. In IEEE SITIS (pp. 184–191).
Bouma, S., Pawley, M. D., Hupman, K., & Gilman, A. (2018). Individual common dolphin identification via metric embedding learning. In IEEE IVCNZ (pp. 1–6).
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. NeurIPS, 33, 1877–1901.
Bruslund Haurum, J., Karpova, A., Pedersen, M., Hein Bengtson, S., & Moeslund, T. B. (2020). Re-identification of zebrafish using metric learning. In WACV workshop (pp. 1–11).
Cao, J., Pang, Y., Anwer, R. M., Cholakkal, H., Xie, J., Shah, M., & Khan, F. S. (2022). Pstr: End-to-end one-step person search with transformers. In CVPR (pp. 9458–9467).
Cao, M., Bai, Y., Zeng, Z., Ye, M., & Zhang, M. (2024). An empirical study of clip for text-based person search. AAAI, 38, 465–473.
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., & Joulin, A. (2021). Emerging properties in self-supervised vision transformers. In ICCV (pp. 9650–9660).
Chan, J., Carrión, H., Mégret, R., Agosto-Rivera, J. L., & Giray, T. (2022). Honeybee re-identification in video: New datasets and impact of self-supervision. In VISIGRAPP (5: VISAPP) (pp. 517–525).
Cheeseman, T., Southerland, K., Park, J., Olio, M., Flynn, K., Calambokidis, J., Jones, L., Garrigue, C., Frisch Jordan, A., Howard, A., et al. (2022). Advanced image recognition: A fully automated, high-accuracy photo-identification matching system for humpback whales. Mammalian Biology, 102(3), 915–929.
Chen, B., Deng, W., & Hu, J. (2019). Mixed high-order attention network for person re-identification. In ICCV (pp 371–381).
Chen, C., Ye, M., Qi, M., & Du, B. (2022a). Sketch transformer: Asymmetrical disentanglement learning from dynamic synthesis. In ACM MM (pp. 4012–4020).
Chen, C., Ye, M., Qi, M., Wu, J., Jiang, J., & Lin, C. W. (2022). Structure-aware positional transformer for visible-infrared person re-identification. IEEE TIP, 31, 2352–2364.
Chen, C., Ye, M., & Jiang, D. (2023a). Towards modality-agnostic person re-identification with descriptive query. In CVPR (pp. 15128–15137).
Chen, H., Lagadec, B., & Bremond, F. (2021a). Ice: Inter-instance contrastive encoding for unsupervised person re-identification. In ICCV (pp. 14960–14969).
Chen, S., Ye, M., & Du, B. (2022c). Rotation invariant transformer for recognizing object in UAVs. In ACM MM (pp. 2565–2574).
Chen, W., Xu, X., Jia, J., Luo, H., Wang, Y., Wang, F., Jin, R., & Sun, X. (2023b). Beyond appearance: A semantic controllable self-supervised learning framework for human-centric visual tasks. In CVPR (pp. 15050–15061).
Chen, X., Xu, C., Cao, Q., Xu, J., Zhong, Y., Xu, J., Li, Z., Wang, J., & Gao, S. (2021b). Oh-former: Omni-relational high-order transformer for person re-identification. arXiv preprint arXiv:2109.11159
Chen, Y. C., Zhu, X., Zheng, W. S., & Lai, J. H. (2017). Person re-identification by camera correlation aware feature augmentation. IEEE TPAMI, 40(2), 392–408.
Cheng, D., Zhou, J., Wang, N., & Gao, X. (2022). Hybrid dynamic contrast and probability distillation for unsupervised person re-id. IEEE TIP, 31, 3334–3346.
Cheng, D., Huang, X., Wang, N., He, L., Li, Z., & Gao, X. (2023a). Unsupervised visible-infrared person reid by collaborative learning with neighbor-guided label refinement. In ACM MM (pp. 7085–7093).
Cheng, D., Wang, G., Wang, B., Zhang, Q., Han, J., & Zhang, D. (2023). Hybrid routing transformer for zero-shot learning. Pattern Recognition, 137, 109270.
Cheng, D., Wang, G., Wang, N., Zhang, D., Zhang, Q., & Gao, X. (2023). Discriminative and robust attribute alignment for zero-shot learning. IEEE TCSVT, 33(8), 4244–4256.
Cheng, D., Li, Y., Zhang, D., Wang, N., Sun, J., & Gao, X. (2024). Progressive negative enhancing contrastive learning for image dehazing and beyond. In IEEE TMM.
Cheng, X., Jia, M., Wang, Q., & Zhang, J. (2022b). More is better: Multi-source dynamic parsing attention for occluded person re-identification. In ACM MM (pp. 6840–6849).
Cho, Y., Kim, W. J., Hong, S., & Yoon, S. E. (2022). Part-based pseudo label refinement for unsupervised person re-identification. In CVPR (pp. 7308–7318).
Choi, S., Kim, T., Jeong, M., Park, H., & Kim, C. (2021). Meta batch-instance normalization for generalizable person re-identification. In CVPR (pp. 3425–3435).
Ci, Y., Wang, Y., Chen, M., Tang, S., Bai, L., Zhu, F., Zhao, R., Yu, F., Qi, D., & Ouyang, W. (2023). Unihcp: A unified model for human-centric perceptions. In CVPR (pp. 17840–17852).
Comandur, B. (2022). Sports re-id: Improving re-identification of players in broadcast videos of team sports. arXiv preprint arXiv:2206.02373
Dai, Y., Liu, J., Sun, Y., Tong, Z., Zhang, C., & Duan, L. Y. (2021). Idm: An intermediate domain module for domain adaptive person re-id. In ICCV (pp. 11864–11874).
Dai, Z., Wang, G., Yuan, W., Zhu, S., & Tan, P. (2022). Cluster contrast for unsupervised person re-identification. In ACCV (pp. 1142–1160).
Dehghani, M., Djolonga, J., Mustafa, B., Padlewski, P., Heek, J., Gilmer, J., Steiner, A. P., Caron, M., Geirhos, R., & Alabdulmohsin, I., et al. (2023). Scaling vision transformers to 22 billion parameters. In ICML (pp. 7480–7512). PMLR.
Deng, W., Zheng, L., Ye, Q., Kang, G., Yang, Y., & Jiao, J. (2018). Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In CVPR (pp. 994–1003).
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Ding, N., Qin, Y., Yang, G., Wei, F., Yang, Z., Su, Y., Hu, S., Chen, Y., Chan, C. M., Chen, W., et al. (2023). Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3), 220–235.
Ding, Z., Ding, C., Shao, Z., & Tao, D. (2021). Semantically self-aligned network for text-to-image part-aware person re-identification. arXiv preprint arXiv:2107.12666
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR.
Fan, L., Li, T., Fang, R., Hristov, R., Yuan, Y., & Katabi, D. (2020). Learning longterm representations for person re-identification using radio signals. In CVPR (pp. 10699–10709).
Farooq, A., Awais, M., Kittler, J., & Khalid, S. S. (2022). Axm-net: Implicit cross-modal feature alignment for person re-identification. AAAI, 36, 4477–4485.
Feng, Y., Yu, J., Chen, F., Ji, Y., Wu, F., Liu, S., & Jing, X. Y. (2022). Visible-infrared person re-identification via cross-modality interaction transformer. In IEEE TMM.
Ferdous, S. N., Li, X., & Lyu, S. (2022). Uncertainty aware multitask pyramid vision transformer for uav-based object re-identification. In ICIP (pp. 2381–2385). IEEE.
Fu, D., Chen, D., Bao, J., Yang, H., Yuan, L., Zhang, L., Li, H., & Chen, D. (2021). Unsupervised pre-training for person re-identification. In CVPR (pp. 14750–14759).
Gao, J., Burghardt, T., Andrew, W., Dowsey, A. W., & Campbell, N. W. (2021). Towards self-supervision for video identification of individual holstein-friesian cattle: The cows2021 dataset. arXiv preprint arXiv:2105.01938
Ge, Y., Zhu, F., Chen, D., Zhao, R., et al. (2020). Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. NeurIPS, 33, 11309–11321.
Gray, D., Brennan, S., & Tao, H. (2007). Evaluating appearance models for recognition, reacquisition, and tracking. PETS, 3, 1–7.
Gu, J., Luo, H., Wang, K., Jiang, W., You, Y., & Zhao, J. (2023). Color prompting for data-free continual unsupervised domain adaptive person re-identification. arXiv preprint arXiv:2308.10716
Guo, H., Zhu, K., Tang, M., & Wang, J. (2019). Two-level attention network with multi-grain ranking loss for vehicle re-identification. IEEE TIP, 28(9), 4328–4338.
Guo, P., Liu, H., Wu, J., Wang, G., & Wang, T. (2023). Semantic-aware consistency network for cloth-changing person re-identification. arXiv preprint arXiv:2308.14113
Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y., et al. (2022). A survey on vision transformer. IEEE TPAMI, 45(1), 87–110.
Han, X., He, S., Zhang, L., & Xiang, T. (2021). Text-based person search with limited data. arXiv:2110.10807
He, B., Li, J., Zhao, Y., & Tian, Y. (2019). Part-regularized near-duplicate vehicle re-identification. In CVPR (pp. 3997–4005).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. In CVPR (pp. 16000–16009).
He, S., Luo, H., Wang, P., Wang, F., Li, H., & Jiang, W. (2021a). Transreid: Transformer-based object re-identification. In ICCV (pp. 15013–15022).
He, S., Chen, W., Wang, K., Luo, H., Wang, F., Jiang, W., & Ding, H. (2023a). Region generation and assessment network for occluded person re-identification. In IEEE TIFS.
He, S., Luo, H., Jiang, W., Jiang, X., & Ding, H. (2023). Vgsg: Vision-guided semantic-group network for text-based person search. IEEE TIP, 33, 163–176.
He, T., Jin, X., Shen, X., Huang, J., Chen, Z., Hua, X. S. (2021b). Dense interaction learning for video-based person re-identification. In ICCV (pp. 1490–1501).
He, T., Shen, X., Huang, J., Chen, Z., & Hua, X. S. (2021c). Partial person re-identification with part-part correspondence learning. In CVPR (pp. 9105–9115).
He, W., Deng, Y., Tang, S., Chen, Q., Xie, Q., Wang, Y., Bai, L., Zhu, F., Zhao, R., & Ouyang, W., et al. (2024). Instruct-reid: A multi-purpose person re-identification task with instructions. In CVPR (pp. 17521–17531).
Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737
Hong, P., Wu, T., Wu, A., Han, X., & Zheng, W. S. (2021). Fine-grained shape-appearance mutual learning for cloth-changing person re-identification. In CVPR (pp. 10513–10522).
Howard, A., Ken, I., Southerland Holbrook. R., & Cheeseman, T. (2022). Happywhale - whale and dolphin identification. https://kaggle.com/competitions/happy-whale-and-dolphin
Jia, M., Cheng, X., Lu, S., & Zhang, J. (2022). Learning disentangled representation implicitly via transformer for occluded person re-identification. IEEE TMM, 25, 1294–1305.
Jia, X., Zhong, X., Ye, M., Liu, W., & Huang, W. (2022). Complementary data augmentation for cloth-changing person re-identification. IEEE TIP, 31, 4227–4239.
Jiang, D., & Ye, M. (2023). Cross-modal implicit relation reasoning and aligning for text-to-image person retrieval. In CVPR (pp. 2787–2797).
Jiang, K., Zhang, T., Liu, X., Qian, B., Zhang, Y., & Wu, F. (2022). Cross-modality transformer for visible-infrared person re-identification. In ECCV (pp. 480–496). Springer.
Jiao, B., Liu, L., Gao, L., Wu, R., Lin, G., Wang, P., & Zhang, Y. (2023). Toward re-identifying any animal. In NeurIPS.
Jin, X., Lan, C., Zeng, W., Chen, Z., & Zhang, L. (2020). Style normalization and restitution for generalizable person re-identification. In CVPR (pp. 3143–3152).
Jin, X., He, T., Zheng, K., Yin, Z., Shen, X., Huang, Z., Feng, R., Huang, J., Chen, Z., & Hua, X. S. (2022). Cloth-changing person re-identification from a single image with gait prediction and regularization. In CVPR (pp. 14278–14287).
Kalayeh, M. M., Basaran, E., Gökmen, M., Kamasak, M. E., & Shah, M. (2018). Human semantic parsing for person re-identification. In CVPR (pp. 1062–1071).
Khan, S. D., & Ullah, H. (2019). A survey of advances in vision-based vehicle re-identification. CVIU, 182, 50–63.
Khorramshahi, P., Kumar, A., Peri, N., Rambhatla, S. S., Chen, J. C., & Chellappa, R. (2019). A dual-path model with adaptive attention for vehicle re-identification. In ICCV (pp. 6132–6141).
Koch, G., Zemel, R., & Salakhutdinov, R., et al. (2015). Siamese neural networks for one-shot image recognition. In ICML workshop (vol. 2). Lille.
Konovalov, D. A., Hillcoat, S., Williams, G., Birtles, R. A., Gardiner, N., & Curnock, M. I. (2018). Individual minke whale recognition using deep learning convolutional neural networks. Journal of Geoscience and Environment Protection, 6, 25–36.
Korschens, M., & Denzler, J. (2019). Elpephants: A fine-grained dataset for elephant re-identification. In ICCV workshop.
Kumar, S., Yaghoubi, E., Das, A., Harish, B., & Proença, H. (2020). The p-destre: A fully annotated dataset for pedestrian detection, tracking, re-identification and search from aerial devices. arXiv preprint arXiv:2004.02782
Kuncheva, L. I., Williams, F., Hennessey, S. L., & Rodríguez, J. J. (2022). A benchmark database for animal re-identification and tracking. In IEEE IPAS (pp. 1–6). IEEE.
Lai, S., Chai, Z., & Wei, X. (2021). Transformer meets part model: Adaptive part division for person re-identification. In ICCV (pp. 4150–4157).
Lee, K. W., Jawade, B., Mohan, D., Setlur, S., & Govindaraju, V. (2022). Attribute de-biased vision transformer (ad-vit) for long-term person re-identification. In IEEE AVSS (pp. 1–8) . IEEE.
Li, H., Li, C., Zhu, X., Zheng, A., & Luo, B. (2020). Multi-spectral vehicle re-identification: A challenge. AAAI, 34, 11345–11353.
Li, H., Wu, G., & Zheng, W. S. (2021a). Combined depth space based architecture search for person re-identification. In CVPR (pp. 6729–6738).
Li, H., Ye, M., & Du, B. (2021b). Weperson: Learning a generalized re-identification model from all-weather virtual data. In ACM MM (pp. 3115–3123).
Li, H., Li, C., Zheng, A., Tang, J., & Luo, B. (2022). Mskat: Multi-scale knowledge-aware transformer for vehicle re-identification. IEEE TITS, 23(10), 19557–19568.
Li, H., Ye, M., Wang, C., & Du, B. (2022b). Pyramidal transformer with conv-patchify for person re-identification. In ACM MM (pp. 7317–7326).
Li, H., Ye, M., Zhang, M., Du, B. (2024a). All in one framework for multimodal re-identification in the wild. In CVPR (pp. 17459–17469).
Li, M., Zhu, X., & Gong, S. (2019). Unsupervised tracklet person re-identification. IEEE TPAMI, 42(7), 1770–1782.
Li, S., Xiao, T., Li, H., Zhou, B., Yue, D., & Wang, X. (2017). Person search with natural language description. In CVPR (pp. 1970–1979).
Li, S., Li, J., Tang, H., Qian, R., & Lin, W. (2019b). Atrw: A benchmark for amur tiger re-identification in the wild. arXiv preprint arXiv:1906.05586
Li, S., Fu, L., Sun, Y., Mu, Y., Chen, L., Li, J., & Gong, H. (2021). Individual dairy cow identification based on lightweight convolutional neural network. Plos one, 16(11), e0260510.
Li, S., Sun, L., & Li, Q. (2023). Clip-reid: Exploiting vision-language model for image re-identification without concrete text labels. AAAI, 37, 1405–1413.
Li, T., Liu, J., Zhang, W., Ni, Y., Wang, W., & Li, Z. (2021d). Uav-human: A large benchmark for human behavior understanding with unmanned aerial vehicles. In CVPR (pp. 16266–16275).
Li, W., Zhao, R., Xiao, T., & Wang, X. (2014). Deepreid: Deep filter pairing neural network for person re-identification. In CVPR (pp. 152–159).
Li, W., Zhu, X., & Gong, S. (2018). Harmonious attention network for person re-identification. In CVPR (pp. 2285–2294).
Li, W., Zou, C., Wang, M., Xu, F., Zhao, J., Zheng, R., Cheng, Y., & Chu, W. (2023b). Dc-former: Diverse and compact transformer for person re-identification. arXiv preprint arXiv:2302.14335
Li, Y., He, J., Zhang, T., Liu, X., Zhang, Y., & Wu, F. (2021e). Diverse part discovery: Occluded person re-identification with part-aware transformer. In CVPR (pp. 2898–2907).
Li, Y., Liu, Y., Zhang, H., Zhao, C., Wei, Z., & Miao, D. (2024b). Occlusion-aware transformer with second-order attention for person re-identification. IEEE TIP
Liang, T., Jin, Y., Liu, W., & Li, Y. (2023). Cross-modality transformer with modality mining for visible-infrared person re-identification. IEEE TMM
Liao, S., & Shao, L. (2021). Transmatcher: Deep image matching through transformers for generalizable person re-identification. NeurIPS, 34, 1992–2003.
Liao, S., Hu, Y., Zhu, X., & Li, S. Z. (2015). Person re-identification by local maximal occurrence representation and metric learning. In CVPR (pp. 2197–2206).
Lin, W., Li, Y., Xiao, H., See, J., Zou, J., Xiong, H., Wang, J., & Mei, T. (2019). Group reidentification with multigrained matching and integration. IEEE transactions on cybernetics, 51(3), 1478–1492.
Lin, Y., Dong, X., Zheng, L., Yan, Y., & Yang, Y. (2019). A bottom-up clustering approach to unsupervised person re-identification. AAAI, 33, 8738–8745.
Lin, Y., Xie, L., Wu, Y., Yan, C., & Tian, Q. (2020). Unsupervised person re-identification via softened similarity learning. In CVPR (pp. 3390–3399).
Liu, F., Ye, M., & Du, B. (2023a). Dual level adaptive weighting for cloth-changing person re-identification. IEEE TIP
Liu, H., Jie, Z., Jayashree, K., Qi, M., Jiang, J., Yan, S., & Feng, J. (2017). Video-based person re-identification with accumulative motion context. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2788–2802.
Liu, X., Liu, W., Ma, H., & Fu, H. (2016a). Large-scale vehicle re-identification in urban surveillance videos. In ICME (pp. 1–6). IEEE.
Liu, X., Liu, W., Mei, T., & Ma, H. (2016b). A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In ECCV (pp. 869–884). Springer.
Liu, X., Zhang, P., Yu, C., Lu, H., Qian, X., & Yang, X. (2021a). A video is worth three views: Trigeminal transformers for video-based person re-identification. arXiv preprint arXiv:2104.01745
Liu, X., Yu, C., Zhang, P., & Lu, H. (2023b). Deeply coupled convolution–transformer with spatial–temporal complementary learning for video-based person re-identification. In IEEE TNNLS.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021b). Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030
Lou, Y., Bai, Y., Liu, J., Wang, S., & Duan, L. (2019). Veri-wild: A large dataset and a new method for vehicle re-identification in the wild. In CVPR (pp. 3235–3243).
Luo, H., Jiang, W., Gu, Y., Liu, F., Liao, X., Lai, S., & Gu, J. (2019). A strong baseline and batch normalization neck for deep person re-identification. IEEE TMM, 22(10), 2597–2609.
Luo, H., Wang, P., Xu, Y., Ding, F., Zhou, Y., Wang, F., Li, H., & Jin, R. (2021). Self-supervised pre-training for transformer-based person re-identification. arXiv preprint arXiv:2111.12084
Mallat, S. G. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE TPAMI, 11(7), 674–693.
Mao, J., Yao, Y., Sun, Z., Huang, X., Shen, F., & Shen, H. T. (2023). Attention map guided transformer pruning for occluded person re-identification on edge device. In IEEE TMM.
McLaughlin, N., Del Rincon, J. M., & Miller, P. (2016). Recurrent convolutional network for video-based person re-identification. In CVPR (pp. 1325–1334).
Meng, D., Li, L., Liu, X., Li, Y., Yang, S., Zha, Z. J., Gao, X., Wang, S., Huang, Q. (2020). Parsing-based view-aware embedding network for vehicle re-identification. In CVPR (pp. 7103–7112).
Miao, J., Wu, Y., Liu, P., Ding, Y., & Yang, Y. (2019). Pose-guided feature alignment for occluded person re-identification. In ICCV (pp. 542–551).
Moskvyak, O., Maire, F., Dayoub, F., & Baktashmotlagh, M. (2020). Learning landmark guided embeddings for animal re-identification. In WACV workshop (pp. 12–19).
Moskvyak, O., Maire, F., Dayoub, F., Armstrong, A. O., & Baktashmotlagh, M. (2021). Robust re-identification of manta rays from natural markings by learning pose invariant embeddings. In DICTA (pp. 1–8). IEEE.
Naseer, M., Ranasinghe, K., Khan, S., Hayat, M., Khan, F. S., & Yang, M. H. (2021). Intriguing properties of vision transformers. arXiv preprint arXiv:2105.10497
Nepovinnykh, E., Eerola, T., & Kalviainen, H. (2020). Siamese network based pelage pattern matching for ringed seal re-identification. In WACV workshop (pp. 25–34).
Nepovinnykh, E., Eerola, T., Biard, V., Mutka, P., Niemi, M., Kunnasranta, M., & Kälviäinen, H. (2022). Sealid: Saimaa ringed seal re-identification dataset. Sensors, 22(19), 7602.
Nguyen, D. T., Hong, H. G., Kim, K. W., & Park, K. R. (2017). Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors, 17(3), 605.
Ni, H., Song, J., Luo, X., Zheng, F., Li, W., & Shen, H. T. (2022). Meta distribution alignment for generalizable person re-identification. In CVPR (pp. 2487–2496).
Ni, H., Li, Y., Gao, L., Shen, H. T., & Song, J. (2023). Part-aware transformer for generalizable person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 11280–11289).
Niu, K., Huang, Y., Ouyang, W., & Wang, L. (2020). Improving description-based person re-identification by multi-granularity image-text alignments. In IEEE TIP (pp. 5542–5556).
Organisciak, D., Poyser, M., Alsehaim, A., Hu, S., Isaac-Medina, B. K., Breckon, T. P., Shum, H. P. (2021). Uav-reid: A benchmark on unmanned aerial vehicle re-identification in video imagery. arXiv preprint arXiv:2104.06219
Pang, L., Wang, Y., Song, Y. Z., Huang, T., Tian, Y. (2018). Cross-domain adversarial feature learning for sketch re-identification. In ACM MM (pp. 609–617).
Papafitsoros, K., Adam, L., Čermák, V., & Picek, L. (2022). Seaturtleid: A novel long-span dataset highlighting the importance of timestamps in wildlife re-identification. arXiv preprint arXiv:2211.10307
Parham, J., Crall, J., Stewart, C., Berger-Wolf, T., Rubenstein, D. I. (2017). Animal population censusing at scale with citizen science and photographic identification. In AAAI.
Park, H., & Ham, B. (2020). Relation network for person re-identification. AAAI, 34, 11839–11847.
Porrello, A., Bergamini, L., & Calderara, S. (2020). Robust re-identification by multiple views knowledge distillation. In ECCV (pp. 93–110). Springer.
Pu, N., Zhong, Z., Sebe, N., Lew, M. S. (2023). A memorizing and generalizing framework for lifelong person re-identification. In IEEE TPAMI
Qian, W., Luo, H., Peng, S., Wang, F., Chen, C., & Li, H. (2022). Unstructured feature decoupling for vehicle re-identification. In ECCV (pp. 336–353).
Qian, X., Wang, W., Zhang, L., Zhu, F., Fu, Y., Xiang, T., Jiang, Y. G., & Xue, X. (2020). Long-term cloth-changing person re-identification. In ACCV.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In ICML (pp. 8748–8763). PMLR.
Rao, H., & Miao, C. (2023). Transg: Transformer-based skeleton graph prototype contrastive learning with structure-trajectory prompted reconstruction for person re-identification. In CVPR (pp. 22118–22128).
Rao, H., Wang, S., Hu, X., Tan, M., Guo, Y., Cheng, J., Liu, X., & Hu, B. (2021). A self-supervised gait encoding approach with locality-awareness for 3d skeleton based person re-identification. IEEE TPAMI, 44(10), 6649–6666.
Rao, H., Leung, C., & Miao, C. (2024). Hierarchical skeleton meta-prototype contrastive learning with hard skeleton mining for unsupervised person re-identification. IJCV, 132(1), 238–260.
Sarafianos, N., Xu, X., & Kakadiaris, I. A. (2019). Adversarial representation learning for text-to-image matching. In ICCV (pp. 5814–5824).
Schneider, S., Taylor, G. W., Linquist, S., & Kremer, S. C. (2019). Past, present and future approaches using computer vision for animal re-identification from camera trap data. Methods in Ecology and Evolution, 10(4), 461–470.
Shao, Z., Zhang, X., Fang, M., Lin, Z., Wang, J., & Ding, C. (2022). Learning granularity-unified representations for text-to-image person re-identification. In ACM MM (pp. 5566–5574).
Shao, Z., Zhang, X., Ding, C., Wang, J., & Wang, J. (2023). Unified pre-training with pseudo texts for text-to-image person re-identification. In ICCV (pp. 11174–11184).
Shen, F., Xie, Y., Zhu, J., Zhu, X., & Zeng, H. (2023). Git: Graph interactive transformer for vehicle re-identification. IEEE TIP, 32, 1039–1051.
Shen, L., He, T., Guo, Y., & Ding, G. (2023b). X-reid: Cross-instance transformer for identity-level person re-identification. arXiv preprint arXiv:2302.02075
Shu, X., Wen, W., Wu, H., Chen, K., Song, Y., Qiao, R., Ren, B., & Wang, X. (2022). See finer, see more: Implicit modality alignment for text-based person retrieval. In ECCV (pp. 624–641). Springer.
Song, G., Leng, B., Liu, Y., Hetang, C., & Cai, S. (2018). Region-based quality estimation network for large-scale person re-identification. In AAAI (vol. 32).
Su, C., Li, J., Zhang, S., Xing, J., Gao, W., & Tian, Q. (2017). Pose-driven deep convolutional model for person re-identification. In ICCV (pp. 3960–3969).
Suh, Y., Wang, J., Tang, S., Mei, T., & Lee, K. M. (2018). Part-aligned bilinear representations for person re-identification. In ECCV (pp. 402–419).
Sun, C. C., Arr, G. S., Ramachandran, R. P., & Ritchie, S. G. (2004). Vehicle reidentification using multidetector fusion. IEEE TITS, 5(3), 155–164.
Sun, X., & Zheng, L. (2019). Dissecting person re-identification from the viewpoint of viewpoint. In CVPR (pp. 608–617).
Sun, Y., Zheng, L., Yang, Y., Tian, Q., & Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In ECCV (pp. 480–496)
Tan, B., Xu, L., Qiu, Z., Wu, Q., & Meng, F. (2023). Mfat: A multi-level feature aggregated transformer for person re-identification. In ICASSP (pp. 1–5). IEEE.
Tan, W., Ding, C., Jiang, J., Wang, F., Zhan, Y., & Tao, D. (2024). Harnessing the power of mllms for transferable text-to-image person reid. In CVPR (pp. 17127–17137).
Tang, S., Chen, C., Xie, Q., Chen, M., Wang, Y., Ci, Y., Bai, L., Zhu, F., Yang, H., & Yi, L., et al. (2023). Humanbench: Towards general human-centric perception with projector assisted pretraining. In CVPR (pp. 21970–21982).
Tang, Z., Naphade, M., Liu, M. Y., Yang, X., Birchfield, S., Wang, S., Kumar, R., Anastasiu, D., & Hwang, J. N. (2019). Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. In CVPR (pp. 8797–8806).
Tang, Z., Zhang, R., Peng, Z., Chen, J., & Lin, L. (2022). Multi-stage spatio-temporal aggregation transformer for video person re-identification. In IEEE TMM.
Teng, S., Zhang, S., Huang, Q., & Sebe, N. (2021). Viewpoint and scale consistency reinforcement for uav vehicle re-identification. IJCV, 129, 719–735.
Tian, X., Liu, J., Zhang, Z., Wang, C., Qu, Y., Xie, Y., & Ma, L. (2022). Hierarchical walking transformer for object re-identification. In ACM MM (pp. 4224–4232).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. NeurIPS 30
Walmer, M., Suri, S., Gupta, K., & Shrivastava, A. (2023). Teaching matters: Investigating the role of supervision in vision transformers. In CVPR (pp. 7486–7496).
Wang, D., & Zhang, S. (2020). Unsupervised person re-identification via multi-label classification. In CVPR (pp. 10981–10990).
Wang, G., Zhang, T., Cheng, J., Liu, S., Yang, Y., & Hou, Z. (2019a). Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment. In ICCV (pp. 3623–3632).
Wang, G., Yang, S., Liu, H., Wang, Z., Yang, Y., Wang, S., Yu, G., Zhou, E., & Sun, J. (2020a). High-order information matters: Learning relation and topology for occluded person re-identification. In CVPR (pp. 6449–6458).
Wang, G., Yu, F., Li, J., Jia, Q., & Ding, S. (2023a). Exploiting the textual potential from vision-language pre-training for text-based person search. arXiv preprint arXiv:2303.04497
Wang, G. A., Zhang, T., Yang, Y., Cheng, J., Chang, J., Liang, X., & Hou, Z. G. (2020). Cross-modality paired-images generation for rgb-infrared person re-identification. AAAI, 34, 12144–12151.
Wang, H., Shen, J., Liu, Y., Gao, Y., & Gavves, E. (2022a). Nformer: Robust person re-identification with neighbor transformer. In CVPR (pp. 7297–7307).
Wang, J., Zhang, Z., Chen, M., Zhang, Y., Wang, C., Sheng, B., Qu, Y., & Xie, Y. (2022b). Optimal transport for label-efficient visible-infrared person re-identification. In ECCV (pp. 93–109). Springer.
Wang, L., Ding, R., Zhai, Y., Zhang, Q., Tang, W., Zheng, N., & Hua, G. (2021). Giant panda identification. IEEE TIP, 30, 2837–2849.
Wang, P., Jiao, B., Yang, L., Yang, Y., Zhang, S., Wei, W., & Zhang, Y. (2019b). Vehicle re-identification in aerial imagery: Dataset and approach. In ICCV (pp. 460–469).
Wang, T., Liu, H., Song, P., Guo, T., & Shi, W. (2022). Pose-guided feature disentangling for occluded person re-identification based on transformer. AAAI, 36, 2540–2549.
Wang, T., Liu, H., Li, W., Ban, M., Guo, T., & Li, Y. (2023b). Feature completion transformer for occluded person re-identification. arXiv preprint arXiv:2303.01656
Wang, W., Xie, E., Li, X., Fan, D. P., Song, K., Liang, D., Lu, T., Luo, P., & Shao, L. (2021b). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122
Wang, X., Wang, X., Jiang, B., & Luo, B. (2023c). Few-shot learning meets transformer: Unified query-support transformers for few-shot classification. In IEEE TCSVT
Wang, Y., Qi, G., Li, S., Chai, Y., & Li, H. (2022). Body part-level domain alignment for domain-adaptive person re-identification with transformer framework. IEEE TIFS, 17, 3321–3334.
Wang, Z., Wang, Z., Zheng, Y., Chuang, Y. Y., & Satoh, S. (2019c). Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In CVPR (pp. 618–626).
Wang, Z., Wang, Z., Zheng, Y., Wu, Y., Zeng, W., & Satoh, S. (2019d). Beyond intra-modality: A survey of heterogeneous person re-identification. arXiv preprint arXiv:1905.10048
Wang, Z., Fang, Z., Wang, J., & Yang, Y. (2020c). Vitaa: Visual-textual attributes alignment in person search by natural language. In ECCV (pp. 402–420). Springer.
Wei, L., Zhang, S., Gao, W., & Tian, Q. (2018). Person transfer gan to bridge domain gap for person re-identification. In CVPR (pp. 79–88).
Wei, R., Gu, J., He, S., & Jiang, W. (2022). Transformer-based domain-specific representation for unsupervised domain adaptive vehicle re-identification. IEEE TITS, 24(3), 2935–2946.
Weideman, H., Stewart, C., Parham, J., Holmberg, J., Flynn, K., Calambokidis, J., Paul, D. B., Bedetti, A., Henley M., & Pope F., et al. (2020). Extracting identifying contours for african elephants and humpback whales using a learned appearance model. In WACV (pp. 1276–1285).
Weideman, H. J., Jablons, Z. M., Holmberg, J., Flynn, K., Calambokidis, J., Tyson, R. B., Allen, J. B., Wells, R. S., Hupman, K., & Urian K., et al. (2017). Integral curvature representation and matching algorithms for identification of dolphins and whales. In ICCV workshop (pp. 2831–2839).
Wu, A., Zheng, W. S., Yu, H. X., Gong, S., & Lai, J. (2017). Rgb-infrared cross-modality person re-identification. In ICCV (pp. 5380–5389).
Wu, J., He, L., Liu, W., Yang, Y., Lei, Z., Mei, T., & Li, S. Z. (2022a). Cavit: Contextual alignment vision transformer for video object re-identification. In ECCV (pp. 549–566). Springer.
Wu, L., Liu, D., Zhang, W., Chen, D., Ge, Z., Boussaid, F., Bennamoun, M., & Shen, J. (2022). Pseudo-pair based self-similarity learning for unsupervised person re-identification. IEEE TIP, 31, 4803–4816.
Wu, P., Wang, L., Zhou, S., Hua, G., & Sun, C. (2024). Temporal correlation vision transformer for video person re-identification. AAAI, 38, 6083–6091.
Wu, Y., Yan, Z., Han, X., Li, G., Zou, C., & Cui, S. (2021). Lapscore: language-guided person search via color reasoning. In ICCV (pp. 1624–1633).
Wu, Z., & Ye, M. (2023). Unsupervised visible-infrared person re-identification via progressive graph matching and alternate learning. In CVPR (pp. 9548–9558).
Xiao, H., Lin, W., Sheng, B., Lu, K., Yan, J., Wang, J., Ding, E., & Zhang, Y., Xiong, H. (2018). Group re-identification: Leveraging and integrating multi-grain information. In ACM MM (pp. 192–200).
Xiao, T., Li, S., Wang, B., Lin, L., & Wang, X. (2017). Joint detection and identification feature learning for person search. In CVPR (pp. 3415–3424).
Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., & Hu, H. (2022). Simmim: A simple framework for masked image modeling. In CVPR (pp. 9653–9663).
Xu, B., He, L., Liang, J., & Sun, Z. (2022). Learning feature recovery transformer for occluded person re-identification. IEEE TIP, 31, 4651–4662.
Xu, P., Zhu, X. (2023). Deepchange: A long-term person re-identification benchmark with clothes change. In ICCV (pp. 11196–11205).
Xu, P., Zhu, X., & Clifton, D. A. (2023). Multimodal learning with transformers: A survey. In IEEE TPAMI.
Xu, W., Liu, H., Shi, W., Miao, Z., Lu, Z., & Chen, F. (2021). Adversarial feature disentanglement for long-term person re-identification. In IJCAI (pp. 1201–1207).
Xuan, S., Zhang, S. (2021). Intra-inter camera similarity for unsupervised person re-identification. In CVPR (pp. 11926–11935).
Yan, K., Tian, Y., Wang, Y., Zeng, W., & Huang, T. (2017). Exploiting multi-grain ranking constraints for precisely searching visually-similar vehicles. In ICCV (pp. 562–570).
Yan, S., Dong, N., Zhang, L., & Tang, J. (2022). Clip-driven fine-grained text-image person re-identification. arXiv preprint arXiv:2210.10276
Yan, Y., Ni, B., Song, Z., Ma, C., Yan, Y., & Yang, X. (2016). Person re-identification via recurrent feature aggregation. In ECCV (pp. 701–716). Springer
Yan, Y., Qin, J., Ni, B., Chen, J., Liu, L., Zhu, F., Zheng, W. S., Yang, X., & Shao, L. (2020). Learning multi-attention context graph for group-based re-identification. IEEE TPAMI, 45(6), 7001–7018.
Yang, B., Ye, M., Chen, J., & Wu, Z. (2022). Augmented dual-contrastive aggregation learning for unsupervised visible-infrared person re-identification. In ACM MM (pp. 2843–2851).
Yang, B., Chen, J., & Ye, M. (2023a). Top-k visual tokens transformer: Selecting tokens for visible-infrared person re-identification. In ICASSP (pp. 1–5). IEEE.
Yang, B., Chen, J., Ye, M. (2023b). Towards grand unified representation learning for unsupervised visible-infrared person re-identification. In ICCV (pp. 11069–11079).
Yang, Q., Wu, A., & Zheng, W. S. (2019). Person re-identification by contour sketch under moderate clothing change. IEEE TPAMI, 43(6), 2029–2046.
Yang, S., Zhou, Y., Zheng, Z., Wang, Y., Zhu, L., & Wu, Y. (2023c). Towards unified text-based person retrieval: A large-scale multi-attribute and language search benchmark. In ACM MM (pp. 4492–4501).
Yang, Z., Wu, D., Wu, C., Lin, Z., Gu, J., & Wang, W. (2024). A pedestrian is worth one prompt: Towards language guidance person re-identification. In CVPR (pp. 17343–17353)
Yao, Y., Zheng, L., Yang, X., Naphade, M., & Gedeon, T. (2020). Simulating content consistent vehicle datasets with attribute descent. In ECCV (pp. 775–791). Springer.
Ye, M., Liang, C., Wang, Z., Leng, Q., Chen, J., & Liu, J. (2015). Specific person retrieval via incomplete text description. In ACM ICMRl (pp. 547–550).
Ye, M., Lan, X., Li, J., Yuen, P. (2018). Hierarchical discriminative learning for visible thermal person re-identification. In AAAI (vol. 32).
Ye, M., Cheng, Y., Lan, X., & Zhu, H. (2019). Improving night-time pedestrian retrieval with distribution alignment and contextual distance. IEEE TII, 16(1), 615–624.
Ye, M., Lan, X., Wang, Z., & Yuen, P. C. (2019). Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE TIFS, 15, 407–419.
Ye, M., Shen, J., & Shao, L. (2020). Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE TIFS, 16, 728–739.
Ye, M., Shen, J., Zhang, X., Yuen, P. C., & Chang, S. F. (2020b). Augmentation invariant and instance spreading feature for softmax embedding. In IEEE TPAMI.
Ye, M., Li, H., Du, B., Shen, J., Shao, L., & Hoi, S. C. (2021). Collaborative refining for person re-identification with label noise. IEEE TIP, 31, 379–391.
Ye, M., Ruan, W., Du, B., & Shou, M. Z. (2021b). Channel augmented joint learning for visible-infrared recognition. In ICCV (pp. 13567–13576).
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., & Hoi, S. C. H. (2021c). Deep learning for person re-identification: A survey and outlook. In IEEE TPAMI (pp. 1–1).
Ye, M., Wu, Z., Chen, C., & Du, B. (2023). Channel augmentation for visible-infrared re-identification. IEEE TPAMI, 01, 1–16.
Ye, Y., Zhou, H., Yu, J., Hu, Q., & Yang, W. (2022). Dynamic feature pruning and consolidation for occluded person re-identification. arXiv preprint arXiv:2211.14742
Yu, H. X., Zheng, W. S., Wu, A., Guo, X., Gong, S., & Lai, J. H. (2019). Unsupervised person re-identification by soft multilabel learning. In CVPR (pp. 2148–2157).
Yu, R., Du, D., LaLonde, R., Davila, D., Funk, C., Hoogs, A., & Clipp, B. (2022). Cascade transformers for end-to-end person search. In CVPR (pp. 7267–7276).
Zapletal, D., & Herout, A. (2016). Vehicle re-identification for automatic video traffic surveillance. In CVPR workshop (pp. 25–31).
Zhai, X., Kolesnikov, A., Houlsby, N., & Beyer, L. (2022a). Scaling vision transformers. In CVPR (pp. 12104–12113).
Zhai, Y., Zeng, Y., Cao, D., & Lu, S. (2022b). Trireid: Towards multi-modal person re-identification via descriptive fusion model. In ICMR (pp. 63–71).
Zhang, B., Liang, Y., & Du, M. (2022a). Interlaced perception for person re-identification based on swin transformer. In IEEE ICIVC (pp. 24–30).
Zhang, G., Zhang, P., Qi, J., & Lu, H. (2021a). Hat: Hierarchical aggregation transformers for person re-identification. In ACM MM (pp. 516–525).
Zhang, G., Zhang, Y., Zhang, T., Li, B., & Pu, S. (2023a). Pha: Patch-wise high-frequency augmentation for transformer-based person re-identification. In CVPR (pp. 14133–14142).
Zhang, Q., Lai, J. H., Feng, Z., & Xie, X. (2022). Uncertainty modeling with second-order transformer for group re-identification. AAAI, 36, 3318–3325.
Zhang, Q., Wang, L., Patel, V. M., Xie, X., & Lai, J. (2024). View-decoupled transformer for person re-identification under aerial-ground camera network. In CVPR (pp. 22000–22009).
Zhang, S., Zhang, Q., Yang, Y., Wei, X., Wang, P., Jiao, B., & Zhang, Y. (2020). Person re-identification in aerial imagery. IEEE TMM, 23, 281–291.
Zhang, S., Yang, Y., Wang, P., Liang, G., Zhang, X., & Zhang, Y. (2021). Attend to the difference: Cross-modality person re-identification via contrastive correlation. IEEE TIP, 30, 8861–8872.
Zhang, T., Wei, L., Xie, L., Zhuang, Z., Zhang, Y., Li, B., & Tian, Q. (2021c). Spatiotemporal transformer for video-based person re-identification. arXiv preprint arXiv:2103.16469
Zhang, T., Xie, L., Wei, L., Zhuang, Z., Zhang, Y., Li, B., & Tian, Q. (2021d). Unrealperson: An adaptive pipeline towards costless person re-identification. In CVPR (pp. 11506–11515).
Zhang, T., Zhao, Q., Da, C., Zhou, L., Li, L., & Jiancuo, S. (2021e). Yakreid-103: A benchmark for yak re-identification. In IEEE IJCB (pp. 1–8). IEEE.
Zhang, X., Ge, Y., Qiao, Y., & Li, H. (2021f). Refining pseudo labels with clustering consensus over generations for unsupervised object re-identification. In CVPR (pp. 3436–3445).
Zhang, X., Li, D., Wang, Z., Wang, J., Ding, E., Shi, J. Q., Zhang, Z., & Wang, J. (2022c). Implicit sample extension for unsupervised person re-identification. In CVPR pp. 7369–7378.
Zhang, Y., & Lu, H. (2018). Deep cross-modal projection learning for image-text matching. In ECCV (pp. 686–701).
Zhang, Y., Wang, Y., Li, H., & Li, S. (2022d). Cross-compatible embedding and semantic consistent feature construction for sketch re-identification. In ACM MM (pp. 3347–3355).
Zhang, Y., Gong, K., Zhang, K., Li, H., Qiao, Y., Ouyang, W., & Yue, X. (2023b). Meta-transformer: A unified framework for multimodal learning. arXiv preprint arXiv:2307.10802
Zhang, Z., Lan, C., Zeng, W., Jin, X., & Chen, Z. (2020b). Relation-aware global attention for person re-identification. In CVPR (pp. 3186–3195).
Zhao, J., Wang, H., Zhou, Y., Yao, R., Chen, S., & El Saddik, A. (2022). Spatial-channel enhanced transformer for visible-infrared person re-identification. In IEEE TMM.
Zhao, Y., Zhong, Z., Yang, F., Luo, Z., Lin, Y., Li, S., & Sebe, N. (2021). Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification. In CVPR (pp. 6277–6286).
Zheng, K., Liu, W., He, L., Mei, T., Luo, J., & Zha, Z. J. (2021). Group-aware label transfer for domain adaptive person re-identification. In CVPR (pp. 5310–5319).
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV (pp. 1116–1124).
Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., & Tian, Q. (2016a). Mars: A video benchmark for large-scale person re-identification. In ECCV (pp. 868–884). Springer.
Zheng, L., Yang, Y., & Hauptmann, A. G. (2016b). Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984
Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., & Tian, Q. (2017a). Person re-identification in the wild. In CVPR (pp. 1367–1376).
Zheng, W., Gong, S., & Xiang, T. (2009). Associating groups of people. In BMVC (pp. 1–11).
Zheng, Z., Zheng, L., & Yang, Y. (2017). A discriminatively learned cnn embedding for person reidentification. ACM TOMM, 14(1), 1–20.
Zheng, Z., Zheng, L., & Yang, Y. (2017c). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV (pp. 3754–3762).
Zhong, Z., Zheng, L., Cao, D., & Li, S. (2017). Re-ranking person re-identification with k-reciprocal encoding. In CVPR (pp. 1318–1327).
Zhou, K., Yang, Y., Cavallaro, A., & Xiang, T. (2019). Omni-scale feature learning for person re-identification. In ICCV (pp. 3702–3712).
Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Learning to prompt for vision-language models. IJCV, 130(9), 2337–2348.
Zhou, M., Liu, H., Lv, Z., Hong, W., & Chen, X. (2022b). Motion-aware transformer for occluded person re-identification. arXiv preprint arXiv:2202.04243
Zhu, A., Wang, Z., Li, Y., Wan, X., Jin, J., Wang, T., Hu, F., & Hua, G. (2021a). Dssl: Deep surroundings-person separation learning for text-based person retrieval. In ACM MM (pp. 209–217).
Zhu, H., Ke, W., Li, D., Liu, J., Tian, L., & Shan, Y. (2022a). Dual cross-attention learning for fine-grained visual categorization and object re-identification. In CVPR (pp. 4692–4702).
Zhu, K., Guo, H., Zhang, S., Wang, Y., Huang, G., Qiao, H., Liu, J., Wang, J., & Tang, M. (2021b). Aaformer: Auto-aligned transformer for person re-identification. arXiv preprint arXiv:2104.00921
Zhu, K., Guo, H., Yan, T., Zhu, Y., Wang, J., & Tang, M. (2022). Pass: Part-aware self-supervised pre-training for person re-identification. ECCV (pp. 198–214). Cham: Springer.
Zhuo, J., Chen, Z., Lai, J., & Wang, G. (2018). Occluded person re-identification. In ICME (pp. 1–6). IEEE.
Zuerl, M., Dirauf, R., Koeferl, F., Steinlein, N., Sueskind, J., Zanca, D., Brehm, I., Lv, Fersen, & Eskofier, B. (2023). Polarbearvidid: A video-based re-identification benchmark dataset for polar bears. Animals, 13(5), 801.
Zuo, J., Yu, C., Sang, N., Gao, & C. (2023). Plip: Language-image pre-training for person representation learning. arXiv preprint arXiv:2305.08386
Zuo, J., Zhou, H., Nie, Y., Zhang, F., Guo, T., Sang, N., Wang, Y., & Gao, C. (2024). Ufinebench: Towards text-based person retrieval with ultra-fine granularity. In CVPR (pp. 22010–22019).
Acknowledgements
This work is supported by National Natural Science Foundation of China (62176188, 62361166629, 62225113).
Funding
National Natural Science Foundation of China (62176188, 62361166629, 62225113).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Bumsub Ham.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ye, M., Chen, S., Li, C. et al. Transformer for Object Re-identification: A Survey. Int J Comput Vis 133, 2410–2440 (2025). https://doi.org/10.1007/s11263-024-02284-4
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02284-4