Learning Hierarchical Visual Transformation for Domain Generalizable Visual Matching and Recognition

Yang, Xun; Chang, Tianyu; Zhang, Tianzhu; Wang, Shanshan; Hong, Richang; Wang, Meng

doi:10.1007/s11263-024-02106-7

Learning Hierarchical Visual Transformation for Domain Generalizable Visual Matching and Recognition

Published: 27 May 2024

Volume 132, pages 4823–4849, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Xun Yang ORCID: orcid.org/0000-0003-0201-1638¹,
Tianyu Chang^2,5,
Tianzhu Zhang¹,
Shanshan Wang³,
Richang Hong⁴ &
…
Meng Wang^4,5

1004 Accesses
23 Citations
Explore all metrics

Abstract

Modern deep neural networks are prone to learn domain-dependent shortcuts and thus usually suffer from severe performance degradation when tested in unseen target domains due to their poor ability of out-of-distribution generalization, which significantly limits the real-world applications. The main reason is the domain shift lying in the large distribution gap between source and unseen target data. To this end, this paper takes a step towards training robust models for domain generalizable visual tasks, which mainly focuses on learning domain-invariant visual representation to alleviate the domain shift. Specifically, we first propose an effective Hierarchical Visual Transformation (HVT) network to (1) first transform the training sample hierarchically into new domains with diverse distributions from three levels: Global, Local, and Pixel, (2) then maximize the visual discrepancy between the source domain and new domains, and minimize the cross-domain feature inconsistency to capture domain-invariant features. Besides, we further enhance the HVT network by introducing the environment-invariant learning. To be specific, we enforce the invariance of the visual representation across automatically inferred environments by minimizing invariant learning loss that considers the weighted average of environmental losses. In this way, we can prevent the model from relying on the spurious features for prediction, thus helping the model to effectively learn domain-invariant representation and narrow the domain gap in various visual matching and recognition tasks, such as stereo matching, pedestrian retrieval, and image classification. We term our extended HVT as EHVT to show distinction. We integrate our EHVT network into different models and evaluate its effectiveness and compatibility on several public benchmark datasets. Extensive experiments clearly show that our EHVT can substantially enhance the generalization performance in various tasks. Our codes are available at https://github.com/cty8998/EHVT-VisualDG.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overcoming Shortcut Learning in a Target Domain by Generalizing Basic Visual Factors from a Source Domain

EPVT: Environment-Aware Prompt Vision Transformer for Domain Generalization in Skin Lesion Recognition

Self-distilled Vision Transformer for Domain Generalization

Data Availibility

The authors confirm that the data supporting the findings of this study are available within the articles: (1) SceneFlow (Mayer et al., 2016), KITTI 2012 (Geiger et al., 2012), KITTI 2015 (Menze & Geiger, 2015), Middlebury (Scharstein et al., 2014), and ETH3D (Schops et al., 2017; 2) CUHK03 (Li et al., 2014), Market-1501 (Zheng et al., 2015), AlicePerson (Sun et al., 2023), MSMT17 (Wei et al., 2018), and RandPerson (Wang et al., 2020; 3) PACS (Li et al., 2017) and Office-Home (Venkateswara et al., 2017; 4) GTAV (Richter et al., 2016), SYNTHIA (Ros et al., 2016), CityScapes (Cordts et al., 2016), BDD100K (Yu et al., 2020), and Mapillary (Neuhold et al., 2017).

References

Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Paz, D. (2019). Invariant risk minimization. arXiv:1907.02893
Bai, Y., Jiao, J., Ce, W., Liu, J., Lou, Y., Feng, X., & Duan, L. Y. (2021). Person30k: A dual-meta generalization network for person re-identification. In CVPR (pp. 2123–2132).
Beery, S., Van Horn, G., & Perona, P. (2018). Recognition in terra incognita. In ECCV (pp. 456–473).
Biswas, J., & Veloso, M. (2011). Depth camera based localization and navigation for indoor mobile robots. In RGB-D Workshop at RSS, Vol. 2011.
Cai, C., Poggi, M., Mattoccia, S., & Mordohai, P. (2020). Matching-space stereo networks for cross-domain generalization. In 3DV (pp. 364–373). IEEE.
Chang, J. R., & Chen, Y. S. (2018). Pyramid stereo matching network. In CVPR (pp. 5410–5418).
Chang, T., Yang, X., Luo, X., Ji, W., & Wang, M. (2023a). Learning style-invariant robust representation for generalizable visual instance retrieval. In Proceedings of the 31st ACM International Conference on Multimedia (pp. 6171–6180).
Chang, T., Yang, X., Zhang, T., & Wang, M. (2023b). Domain generalized stereo matching via hierarchical visual transformation. In CVPR (pp. 9559–9568).
Chang, S., Zhang, Y., Yu, M., & Jaakkola, T. (2020). Invariant rationalization. In ICML (pp. 1448–1458). PMLR.
Chen, C., Li, Z., Ouyang, C., Sinclair, M., Bai, W., & Rueckert, D. (2022). Maxstyle: Adversarial style composition for robust medical image segmentation. In MICCAI (pp. 151–161). Springer.
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.
Article Google Scholar
Choi, S., Jung, S., Yun, H., Kim, J. T., Kim, S., & Choo, J. (2021a). Robustnet: Improving domain generalization in urban-scene segmentation via instance selective whitening. In CVPR (pp. 11580–11590).
Choi, S., Kim, T., Jeong, M., Park, H., & Kim, C. (2021b). Meta batch-instance normalization for generalizable person re-identification. In CVPR (pp. 3425–3435).
Chuah, W., Tennakoon, R., Hoseinnezhad, R., Bab-Hadiashar, A., & Suter, D. (2022). Itsa: An information-theoretic approach to automatic shortcut avoidance and domain generalization in stereo matching networks. In CVPR (pp. 13022–13032).
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In CVPR (pp. 3213–3223).
Cui, Y., Tao, Y., Ren, W., & Knoll, A. (2023). Dual-domain attention for image deblurring. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, pp. 479–487).
Dai, R., Shen, L., He, F., Tian, X., & Tao, D. (2022). Dispfl: Towards communication-efficient personalized federated learning via decentralized sparse training. In ICML (pp. 4587–4604). PMLR.
Dong, J., Li, X., Xu, C., Yang, X., Yang, G., Wang, X., & Wang, M. (2021). Dual encoding for video retrieval by text. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8), 4065–4080.
Google Scholar
Fathy, M. E., Tran, Q. H., Zia, M. Z., Vernaza, P., & Chandraker, M. (2018). Hierarchical metric learning and matching for 2d and 3d geometric correspondences. In ECCV (pp. 803–819).
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The Kitti vision benchmark suite. In CVPR (pp. 3354–3361). IEEE.
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., & Tan, P. (2020). Cascade cost volume for high-resolution multi-view stereo and stereo matching. In CVPR (pp. 2495–2504).
Guo, X., Yang, K., Yang, W., Wang, X., & Li, H. (2019). Group-wise correlation stereo network. In CVPR (pp. 3273–3282).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
Huang, Z., Wang, H., Xing, E.P., & Huang, D. (2020). Self-challenging improves cross-domain generalization. In: ECCV (pp. 124–140). Springer.
Huang, L., Zhou, Y., Zhu, F., Liu, L., & Shao, L. (2019). Iterative normalization: Beyond standardization towards efficient whitening. In CVPR (pp. 4874–4883).
Huang, B. W., Liao, K. T., Kao, C. S., & Lin, S. D. (2022). Environment diversification with multi-head neural network for invariant learning. NeurIPS, 35, 915–927.
Google Scholar
Hu, Y., He, H., Xu, C., Wang, B., & Lin, S. (2018). Exposure: A white-box photo post-processing framework. ACM Transactions on Graphics (TOG), 37(2), 1–17.
Article Google Scholar
Jiang, B., Wang, X., Zheng, A., Tang, J., & Luo, B. (2021). Ph-gcn: Person retrieval with part-based hierarchical graph convolutional network. IEEE Transactions on Multimedia, 24, 3218–3228.
Article Google Scholar
Jiao, B., Liu, L., Gao, L., Lin, G., Yang, L., Zhang, S., Wang, P., & Zhang, Y. (2022). Dynamically transformed instance normalization network for generalizable person re-identification. In ECCV (pp. 285–301). Springer.
Jin, X., Lan, C., Zeng, W., Chen, Z., & Zhang, L. (2020). Style normalization and restitution for generalizable person re-identification. In CVPR (pp. 3143–3152).
Kamath, P., Tangella, A., Sutherland, D., & Srebro, N. (2021). Does invariant risk minimization capture invariance? In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research (Vol. 130, pp. 4069–4077). PMLR.
Kang, G., Jiang, L., Yang, Y., & Hauptmann, A. G. (2019). Contrastive adaptation network for unsupervised domain adaptation. In CVPR (pp. 4893–4902).
Kang, J., Lee, S., Kim, N., & Kwak, S. (2022). Style neophile: Constantly seeking novel styles for domain generalization. In CVPR (pp. 7130–7140).
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. In ICCV (pp. 66–75).
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. NeurIPS 25.
Krueger, D., Caballero, E., Jacobsen, J.H., Zhang, A., Binas, J., Zhang, D., Le Priol, R., & Courville, A. (2021). Out-of-distribution generalization via risk extrapolation (rex). In International Conference on Machine Learning (pp. 5815–5826). PMLR.
Li, X., Dai, Y., Ge, Y., Liu, J., Shan, Y., & Duan, L. Y. (2022). Uncertainty modeling for out-of-distribution generalization. arXiv:2202.03958
Li, X., Lu, Y., Liu, B., Hou, Y., Liu, Y., Chu, Q., Ouyang, W., & Yu, N. (2023). Clothes-invariant feature learning by causal intervention for clothes-changing person re-identification. arXiv:2305.06145
Li, H., Pan, S. J., Wang, S., & Kot, A. C. (2018). Domain generalization with adversarial feature learning. In CVPR (pp. 5400–5409).
Li, D., Yang, Y., Song, Y. Z., & Hospedales, T. M. (2017). Deeper, broader and artier domain generalization. In ICCV (pp. 5542–5550).
Li, W., Zhao, R., Xiao, T., & Wang, X. (2014). Deepreid: Deep filter pairing neural network for person re-identification. In CVPR (pp. 152–159).
Liao, S., & Shao, L. (2020). Interpretable and generalizable person re-identification with query-adaptive convolution and temporal lifting. In ECCV (pp. 456–474). Springer.
Liao, S., & Shao, L. (2022). Graph sampling based deep metric learning for generalizable person re-identification. In CVPR (pp. 7359–7368).
Liao, S., & Shao, L. (2021). Transmatcher: Deep image matching through transformers for generalizable person re-identification. NeurIPS, 34, 1992–2003.
Google Scholar
Lin, Y., Lian, Q., & Zhang, T. (2021). An empirical study of invariant risk minimization on deep models. In ICML Workshop on Uncertainty and Robustness in Deep Learning (Vol. 1, p. 7).
Lipson, L., Teed, Z., & Deng, J. (2021). Raft-stereo: Multilevel recurrent field transforms for stereo matching. In 3DV (pp. 218–227). IEEE.
Liu, B., Yu, H., & Qi, G. (2022). Graftnet: Towards domain generalized stereo matching with a broad-spectrum and task-oriented feature. In CVPR (pp. 13012–13021).
Liu, X., Yang, X., Wang, M., & Hong, R. (2020). Deep neighborhood component analysis for visual similarity modeling. ACM Transactions on Intelligent Systems and Technology (TIST), 11, 1–15.
Google Scholar
Lv, F., Liang, J., Li, S., Zang, B., Liu, C.H., Wang, Z., & Liu, D. (2022). Causality inspired representation learning for domain generalization. In CVPR (pp. 8046–8056).
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR (pp. 4040–4048).
Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In CVPR (pp. 3061–3070).
Mu, J., Li, Y., Li, J., & Yang, J. (2022). Learning clothes-irrelevant cues for clothes-changing person re-identification. In BMVC.
Neuhold, G., Ollmann, T., Rota Bulo, S., & Kontschieder, P. (2017). The mapillary vistas dataset for semantic understanding of street scenes. In ICCV (pp. 4990–4999).
Ni, H., Song, J., Luo, X., Zheng, F., Li, W., & Shen, H. T. (2022). Meta distribution alignment for generalizable person re-identification. In CVPR (pp. 2487–2496).
Pan, X., Luo, P., Shi, J., & Tang, X. (2018). Two at once: Enhancing learning and generalization capacities via ibn-net. In ECCV (pp. 464–479).
Pan, X., Zhan, X., Shi, J., Tang, X., & Luo, P. (2019). Switchable whitening for deep representation learning. In ICCV (pp. 1863–1871).
Peng, D., Lei, Y., Hayat, M., Guo, Y., & Li, W. (2022). Semantic-aware domain generalized segmentation. In CVPR (pp. 2594–2605).
Peng, D., Lei, Y., Liu, L., Zhang, P., & Liu, J. (2021). Global and local texture randomization for synthetic-to-real semantic segmentation. IEEE Transactions on Image Processing, 30, 6594–6608.
Article Google Scholar
Radenović, F., Iscen, A., Tolias, G., Avrithis, Y., & Chum, O. (2018). Revisiting oxford and paris: Large-scale image retrieval benchmarking. In CVPR (pp. 5706–5715).
Richter, S. R., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In ECCV (pp. 102–118). Springer.
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In CVPR (pp. 3234–3243).
Saito, K., Watanabe, K., Ushiku, Y., & Harada, T. (2018). Maximum classifier discrepancy for unsupervised domain adaptation. In CVPR (pp. 3723–3732).
Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., & Westling, P. (2014). High-resolution stereo datasets with subpixel-accurate ground truth. In German conference on pattern recognition (pp. 31–42). Springer.
Schops, T., Schonberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., & Geiger, A. (2017). A multi-view stereo benchmark with high-resolution images and multi-camera videos. In CVPR (pp. 3260–3269).
Shen, Z., Dai, Y., & Rao, Z. (2021). Cfnet: Cascade and fused cost volume for robust stereo matching. In CVPR (pp. 13906–13915).
Song, P., Guo, D., Yang, X., Tang, S., & Wang, M. (2024). Emotional video captioning with vision-based emotion interpretation network. IEEE Transactions on Image Processing.
Sun, C., Vianney, J. M. U., & Cao, D. (2019). Affordance learning in direct perception for autonomous driving. arXiv:1903.08746
Sun, X., Yao, Y., Wang, S., Li, H., & Zheng, L. (2023). Alice benchmarks: Connecting real world object re-identification with the synthetic. arXiv:2310.04416
Venkateswara, H., Eusebio, J., Chakraborty, S., & Panchanathan, S. (2017). Deep hashing network for unsupervised domain adaptation. In CVPR (pp. 5018–5027).
Wang, J., Lan, C., Liu, C., Ouyang, Y., Qin, T., Lu, W., Chen, Y., Zeng, W., & Yu, P. (2022a). Generalizing to unseen domains: A survey on domain generalization. IEEE Transactions on Knowledge and Data Engineering.
Wang, Y., Liao, S., & Shao, L. (2020). Surpassing real-world source training data: Random 3d characters for generalizable person re-identification. In ACM MM (pp. 3422–3430).
Wang, Z., Luo, Y., Qiu, R., Huang, Z., & Baktashmotlagh, M. (2021). Learning to diversify for single domain generalization. In ICCV (pp. 834–843).
Wang, R., Yi, M., Chen, Z., & Zhu, S. (2022b). Out-of-distribution generalization with causal invariant transformations. In CVPR (pp. 375–385).
Wei, L., Zhang, S., Gao, W., & Tian, Q. (2018). Person transfer gan to bridge domain gap for person re-identification. In CVPR (pp. 79–88).
Xie, C., Ye, H., Chen, F., Liu, Y., Sun, R., & Li, Z. (2020). Risk variance penalization. arXiv:2006.07544
Xu, Q., Zhang, R., Zhang, Y., Wang, Y., & Tian, Q. (2021). A fourier-based framework for domain generalization. In CVPR (pp. 14383–14392).
Yang, X., Feng, F., Ji, W., Wang, M., & Chua, T. S. (2021). Deconfounded video moment retrieval with causal intervention. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.
Yang, G., Song, X., Huang, C., Deng, Z., Shi, J., & Zhou, B. (2019). Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 899–908).
Yan, C., Gong, B., Wei, Y., & Gao, Y. (2020). Deep multi-view enhancement hashing for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(4), 1445–1451.
Article Google Scholar
Yang, X., Wang, S., Dong, J., Dong, J., Wang, M., & Chua, T. S. (2022). Video moment retrieval with cross-modal neural architecture search. IEEE Transactions on Image Processing, 31, 1204–1216.
Article Google Scholar
Yang, X., Zhou, P., & Wang, M. (2018). Person reidentification via structural deep metric learning. IEEE Transactions on Neural Networks and Learning Systems, 30(10), 2987–2998.
Article Google Scholar
Yang, X., Zhou, P., & Wang, M. (2019). Person reidentification via structural deep metric learning. IEEE Transactions on Neural Networks and Learning Systems, 30(10), 2987–2998.
Article Google Scholar
Yan, C., Pang, G., Bai, X., Liu, C., Ning, X., Gu, L., & Zhou, J. (2021). Beyond triplet loss: Person re-identification with fine-grained difference-aware pairwise loss. IEEE Transactions on Multimedia, 24, 1665–1677.
Article Google Scholar
Yao, C., Jia, Y., Di, H., Li, P., & Wu, Y. (2021). A decomposition model for stereo matching. In CVPR (pp. 6091–6100).
Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., & Darrell, T. (2020). Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In CVPR (pp. 2636–2645).
Yu, Y., Khadivi, S., & Xu, J. (2022). Can data diversity enhance learning generalization? In Proceedings of the 29th International Conference on Computational Linguistics (pp. 4933–4945).
Yue, X., Zhang, Y., Zhao, S., Sangiovanni-Vincentelli, A., Keutzer, K., & Gong, B. (2019). Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data. In ICCV (pp. 2100–2110).
Zbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In CVPR (pp. 1592–1599).
Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2018). mixup: Beyond empirical risk minimization. In International Conference on Learning Representations.
Zhang, Y., Deng, B., Li, R., Jia, K., & Zhang, L. (2023). Adversarial style augmentation for domain generalization. arXiv:2301.12643
Zhang, P., Dou, H., Yu, Y., & Li, X. (2022b). Adaptive cross-domain learning for generalizable person re-identification. In ECCV (pp. 215–232). Springer.
Zhang, Y., Li, M., Li, R., Jia, K., & Zhang, L. (2022c). Exact feature distribution matching for arbitrary style transfer and domain generalization. In CVPR (pp. 8035–8045).
Zhang, F., Prisacariu, V., Yang, R., & Torr, P.H. (2019). Ga-net: Guided aggregation net for end-to-end stereo matching. In CVPR (pp. 185–194).
Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., & Torr, P. (2020). Domain-invariant stereo matching networks. In ECCV (pp. 420–439). Springer.
Zhang, A., Ren, W., Liu, Y., & Cao, X. (2023). Lightweight image super-resolution with superpixel token interaction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 12728–12737).
Zhang, J., Wang, X., Bai, X., Wang, C., Huang, L., Chen, Y., Gu, L., Zhou, J., Harada, T., & Hancock, E. R. (2022a). Revisiting domain generalized stereo matching networks from a feature consistency perspective. In CVPR (pp. 13001–13011).
Zhang, F., & Wah, B. W. (2017). Fundamental principles on learning new features for effective dense matching. IEEE Transactions on Image Processing, 27(2), 822–836.
Article MathSciNet Google Scholar
Zhao, Y., Zhong, Z., Yang, F., Luo, Z., Lin, Y., Li, S., & Sebe, N. (2021). Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification. In CVPR (pp. 6277–6286).
Zhao, Y., Zhong, Z., Zhao, N., Sebe, N., & Lee, G.H. (2022). Style-hallucinated dual consistency learning for domain generalized semantic segmentation. In ECCV (pp. 535–552). Springer.
Zhao, Y., Zhong, Z., Zhao, N., Sebe, N., & Lee, G. H. (2024). Style-hallucinated dual consistency learning: A unified framework for visual domain generalization. International Journal of Computer Vision, 132(3), 837–853.
Article Google Scholar
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV (pp. 1116–1124).
Zhong, Z., Zheng, L., Cao, D., & Li, S. (2017). Re-ranking person re-identification with k-reciprocal encoding. In CVPR (pp. 1318–1327).
Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2020a). Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 13001–13008).
Zhong, Z., Zhao, Y., Lee, G. H., & Sebe, N. (2022). Adversarial style augmentation for domain generalized urban-scene segmentation. NeurIPS, 35, 338–350.
Google Scholar
Zhong, Z., Zheng, L., Luo, Z., Li, S., & Yang, Y. (2020b). Learning to adapt invariance in memory for person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(8), 2723–2738.
Google Scholar
Zhong, Z., Zheng, L., Zheng, Z., Li, S., & Yang, Y. (2018). Camstyle: A novel data augmentation method for person re-identification. IEEE Transactions on Image Processing, 28(3), 1176–1190.
Article MathSciNet Google Scholar
Zhou, S., Guo, D., Li, J., Yang, X., & Wang, M. (2023). Exploring sparse spatial relation in graph inference for text-based vqa. IEEE Transactions on Image Processing.
Zhou, K., Yang, Y., Hospedales, T., & Xiang, T. (2020). Learning to generate novel domains for domain generalization. In ECCV (pp. 561–578). Springer.
Zhou, K., Yang, Y., Qiao, Y., & Xiang, T. (2021b). Domain generalization with mixstyle. arXiv:2104.02008
Zhou, S., Guo, D., Yang, X., Dong, J., & Wang, M. (2024). Graph pooling inference network for text-based vqa. ACM Transactions on Multimedia Computing, Communications, and Applications, 20(4), 1–21.
Article Google Scholar
Zhou, K., Yang, Y., Cavallaro, A., & Xiang, T. (2021a). Learning generalisable omni-scale representations for person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5056–5069.
Google Scholar
Zhuang, Z., Wei, L., Xie, L., Zhang, T., Zhang, H., Wu, H., Ai, H., & Tian, Q. (2020). Rethinking the distribution gap of person re-identification with camera-based batch normalization. In ECCV (pp. 140–157). Springer.

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant U22A2094, Grant 62272435, and Grant 72188101.

Author information

Authors and Affiliations

School of Information Science and Technology, University of Science and Technology of China, Hefei, China
Xun Yang & Tianzhu Zhang
Institute of Advanced Technology, University of Science and Technology of China, Hefei, China
Tianyu Chang
Institutes of Physical Science and Information Technology, Anhui University, Hefei, China
Shanshan Wang
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
Richang Hong & Meng Wang
Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
Tianyu Chang & Meng Wang

Authors

Xun Yang
View author publications
Search author on:PubMed Google Scholar
Tianyu Chang
View author publications
Search author on:PubMed Google Scholar
Tianzhu Zhang
View author publications
Search author on:PubMed Google Scholar
Shanshan Wang
View author publications
Search author on:PubMed Google Scholar
Richang Hong
View author publications
Search author on:PubMed Google Scholar
Meng Wang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Xun Yang.

Additional information

Communicated by Zhun Zhong.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, X., Chang, T., Zhang, T. et al. Learning Hierarchical Visual Transformation for Domain Generalizable Visual Matching and Recognition. Int J Comput Vis 132, 4823–4849 (2024). https://doi.org/10.1007/s11263-024-02106-7

Download citation

Received: 16 October 2023
Accepted: 26 April 2024
Published: 27 May 2024
Version of record: 27 May 2024
Issue date: November 2024
DOI: https://doi.org/10.1007/s11263-024-02106-7

Keywords

Part of a collection:

Special Issue on Open-World Visual Recognition

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Hierarchical Visual Transformation for Domain Generalizable Visual Matching and Recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Overcoming Shortcut Learning in a Target Domain by Generalizing Basic Visual Factors from a Source Domain

EPVT: Environment-Aware Prompt Vision Transformer for Domain Generalization in Skin Lesion Recognition

Self-distilled Vision Transformer for Domain Generalization

Explore related subjects

Data Availibility

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now