Abstract
Modern deep neural networks are prone to learn domain-dependent shortcuts and thus usually suffer from severe performance degradation when tested in unseen target domains due to their poor ability of out-of-distribution generalization, which significantly limits the real-world applications. The main reason is the domain shift lying in the large distribution gap between source and unseen target data. To this end, this paper takes a step towards training robust models for domain generalizable visual tasks, which mainly focuses on learning domain-invariant visual representation to alleviate the domain shift. Specifically, we first propose an effective Hierarchical Visual Transformation (HVT) network to (1) first transform the training sample hierarchically into new domains with diverse distributions from three levels: Global, Local, and Pixel, (2) then maximize the visual discrepancy between the source domain and new domains, and minimize the cross-domain feature inconsistency to capture domain-invariant features. Besides, we further enhance the HVT network by introducing the environment-invariant learning. To be specific, we enforce the invariance of the visual representation across automatically inferred environments by minimizing invariant learning loss that considers the weighted average of environmental losses. In this way, we can prevent the model from relying on the spurious features for prediction, thus helping the model to effectively learn domain-invariant representation and narrow the domain gap in various visual matching and recognition tasks, such as stereo matching, pedestrian retrieval, and image classification. We term our extended HVT as EHVT to show distinction. We integrate our EHVT network into different models and evaluate its effectiveness and compatibility on several public benchmark datasets. Extensive experiments clearly show that our EHVT can substantially enhance the generalization performance in various tasks. Our codes are available at https://github.com/cty8998/EHVT-VisualDG.
Similar content being viewed by others
Data Availibility
The authors confirm that the data supporting the findings of this study are available within the articles: (1) SceneFlow (Mayer et al., 2016), KITTI 2012 (Geiger et al., 2012), KITTI 2015 (Menze & Geiger, 2015), Middlebury (Scharstein et al., 2014), and ETH3D (Schops et al., 2017; 2) CUHK03 (Li et al., 2014), Market-1501 (Zheng et al., 2015), AlicePerson (Sun et al., 2023), MSMT17 (Wei et al., 2018), and RandPerson (Wang et al., 2020; 3) PACS (Li et al., 2017) and Office-Home (Venkateswara et al., 2017; 4) GTAV (Richter et al., 2016), SYNTHIA (Ros et al., 2016), CityScapes (Cordts et al., 2016), BDD100K (Yu et al., 2020), and Mapillary (Neuhold et al., 2017).
References
Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Paz, D. (2019). Invariant risk minimization. arXiv:1907.02893
Bai, Y., Jiao, J., Ce, W., Liu, J., Lou, Y., Feng, X., & Duan, L. Y. (2021). Person30k: A dual-meta generalization network for person re-identification. In CVPR (pp. 2123–2132).
Beery, S., Van Horn, G., & Perona, P. (2018). Recognition in terra incognita. In ECCV (pp. 456–473).
Biswas, J., & Veloso, M. (2011). Depth camera based localization and navigation for indoor mobile robots. In RGB-D Workshop at RSS, Vol. 2011.
Cai, C., Poggi, M., Mattoccia, S., & Mordohai, P. (2020). Matching-space stereo networks for cross-domain generalization. In 3DV (pp. 364–373). IEEE.
Chang, J. R., & Chen, Y. S. (2018). Pyramid stereo matching network. In CVPR (pp. 5410–5418).
Chang, T., Yang, X., Luo, X., Ji, W., & Wang, M. (2023a). Learning style-invariant robust representation for generalizable visual instance retrieval. In Proceedings of the 31st ACM International Conference on Multimedia (pp. 6171–6180).
Chang, T., Yang, X., Zhang, T., & Wang, M. (2023b). Domain generalized stereo matching via hierarchical visual transformation. In CVPR (pp. 9559–9568).
Chang, S., Zhang, Y., Yu, M., & Jaakkola, T. (2020). Invariant rationalization. In ICML (pp. 1448–1458). PMLR.
Chen, C., Li, Z., Ouyang, C., Sinclair, M., Bai, W., & Rueckert, D. (2022). Maxstyle: Adversarial style composition for robust medical image segmentation. In MICCAI (pp. 151–161). Springer.
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.
Choi, S., Jung, S., Yun, H., Kim, J. T., Kim, S., & Choo, J. (2021a). Robustnet: Improving domain generalization in urban-scene segmentation via instance selective whitening. In CVPR (pp. 11580–11590).
Choi, S., Kim, T., Jeong, M., Park, H., & Kim, C. (2021b). Meta batch-instance normalization for generalizable person re-identification. In CVPR (pp. 3425–3435).
Chuah, W., Tennakoon, R., Hoseinnezhad, R., Bab-Hadiashar, A., & Suter, D. (2022). Itsa: An information-theoretic approach to automatic shortcut avoidance and domain generalization in stereo matching networks. In CVPR (pp. 13022–13032).
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In CVPR (pp. 3213–3223).
Cui, Y., Tao, Y., Ren, W., & Knoll, A. (2023). Dual-domain attention for image deblurring. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, pp. 479–487).
Dai, R., Shen, L., He, F., Tian, X., & Tao, D. (2022). Dispfl: Towards communication-efficient personalized federated learning via decentralized sparse training. In ICML (pp. 4587–4604). PMLR.
Dong, J., Li, X., Xu, C., Yang, X., Yang, G., Wang, X., & Wang, M. (2021). Dual encoding for video retrieval by text. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8), 4065–4080.
Fathy, M. E., Tran, Q. H., Zia, M. Z., Vernaza, P., & Chandraker, M. (2018). Hierarchical metric learning and matching for 2d and 3d geometric correspondences. In ECCV (pp. 803–819).
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The Kitti vision benchmark suite. In CVPR (pp. 3354–3361). IEEE.
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., & Tan, P. (2020). Cascade cost volume for high-resolution multi-view stereo and stereo matching. In CVPR (pp. 2495–2504).
Guo, X., Yang, K., Yang, W., Wang, X., & Li, H. (2019). Group-wise correlation stereo network. In CVPR (pp. 3273–3282).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
Huang, Z., Wang, H., Xing, E.P., & Huang, D. (2020). Self-challenging improves cross-domain generalization. In: ECCV (pp. 124–140). Springer.
Huang, L., Zhou, Y., Zhu, F., Liu, L., & Shao, L. (2019). Iterative normalization: Beyond standardization towards efficient whitening. In CVPR (pp. 4874–4883).
Huang, B. W., Liao, K. T., Kao, C. S., & Lin, S. D. (2022). Environment diversification with multi-head neural network for invariant learning. NeurIPS, 35, 915–927.
Hu, Y., He, H., Xu, C., Wang, B., & Lin, S. (2018). Exposure: A white-box photo post-processing framework. ACM Transactions on Graphics (TOG), 37(2), 1–17.
Jiang, B., Wang, X., Zheng, A., Tang, J., & Luo, B. (2021). Ph-gcn: Person retrieval with part-based hierarchical graph convolutional network. IEEE Transactions on Multimedia, 24, 3218–3228.
Jiao, B., Liu, L., Gao, L., Lin, G., Yang, L., Zhang, S., Wang, P., & Zhang, Y. (2022). Dynamically transformed instance normalization network for generalizable person re-identification. In ECCV (pp. 285–301). Springer.
Jin, X., Lan, C., Zeng, W., Chen, Z., & Zhang, L. (2020). Style normalization and restitution for generalizable person re-identification. In CVPR (pp. 3143–3152).
Kamath, P., Tangella, A., Sutherland, D., & Srebro, N. (2021). Does invariant risk minimization capture invariance? In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research (Vol. 130, pp. 4069–4077). PMLR.
Kang, G., Jiang, L., Yang, Y., & Hauptmann, A. G. (2019). Contrastive adaptation network for unsupervised domain adaptation. In CVPR (pp. 4893–4902).
Kang, J., Lee, S., Kim, N., & Kwak, S. (2022). Style neophile: Constantly seeking novel styles for domain generalization. In CVPR (pp. 7130–7140).
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. In ICCV (pp. 66–75).
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. NeurIPS 25.
Krueger, D., Caballero, E., Jacobsen, J.H., Zhang, A., Binas, J., Zhang, D., Le Priol, R., & Courville, A. (2021). Out-of-distribution generalization via risk extrapolation (rex). In International Conference on Machine Learning (pp. 5815–5826). PMLR.
Li, X., Dai, Y., Ge, Y., Liu, J., Shan, Y., & Duan, L. Y. (2022). Uncertainty modeling for out-of-distribution generalization. arXiv:2202.03958
Li, X., Lu, Y., Liu, B., Hou, Y., Liu, Y., Chu, Q., Ouyang, W., & Yu, N. (2023). Clothes-invariant feature learning by causal intervention for clothes-changing person re-identification. arXiv:2305.06145
Li, H., Pan, S. J., Wang, S., & Kot, A. C. (2018). Domain generalization with adversarial feature learning. In CVPR (pp. 5400–5409).
Li, D., Yang, Y., Song, Y. Z., & Hospedales, T. M. (2017). Deeper, broader and artier domain generalization. In ICCV (pp. 5542–5550).
Li, W., Zhao, R., Xiao, T., & Wang, X. (2014). Deepreid: Deep filter pairing neural network for person re-identification. In CVPR (pp. 152–159).
Liao, S., & Shao, L. (2020). Interpretable and generalizable person re-identification with query-adaptive convolution and temporal lifting. In ECCV (pp. 456–474). Springer.
Liao, S., & Shao, L. (2022). Graph sampling based deep metric learning for generalizable person re-identification. In CVPR (pp. 7359–7368).
Liao, S., & Shao, L. (2021). Transmatcher: Deep image matching through transformers for generalizable person re-identification. NeurIPS, 34, 1992–2003.
Lin, Y., Lian, Q., & Zhang, T. (2021). An empirical study of invariant risk minimization on deep models. In ICML Workshop on Uncertainty and Robustness in Deep Learning (Vol. 1, p. 7).
Lipson, L., Teed, Z., & Deng, J. (2021). Raft-stereo: Multilevel recurrent field transforms for stereo matching. In 3DV (pp. 218–227). IEEE.
Liu, B., Yu, H., & Qi, G. (2022). Graftnet: Towards domain generalized stereo matching with a broad-spectrum and task-oriented feature. In CVPR (pp. 13012–13021).
Liu, X., Yang, X., Wang, M., & Hong, R. (2020). Deep neighborhood component analysis for visual similarity modeling. ACM Transactions on Intelligent Systems and Technology (TIST), 11, 1–15.
Lv, F., Liang, J., Li, S., Zang, B., Liu, C.H., Wang, Z., & Liu, D. (2022). Causality inspired representation learning for domain generalization. In CVPR (pp. 8046–8056).
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR (pp. 4040–4048).
Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In CVPR (pp. 3061–3070).
Mu, J., Li, Y., Li, J., & Yang, J. (2022). Learning clothes-irrelevant cues for clothes-changing person re-identification. In BMVC.
Neuhold, G., Ollmann, T., Rota Bulo, S., & Kontschieder, P. (2017). The mapillary vistas dataset for semantic understanding of street scenes. In ICCV (pp. 4990–4999).
Ni, H., Song, J., Luo, X., Zheng, F., Li, W., & Shen, H. T. (2022). Meta distribution alignment for generalizable person re-identification. In CVPR (pp. 2487–2496).
Pan, X., Luo, P., Shi, J., & Tang, X. (2018). Two at once: Enhancing learning and generalization capacities via ibn-net. In ECCV (pp. 464–479).
Pan, X., Zhan, X., Shi, J., Tang, X., & Luo, P. (2019). Switchable whitening for deep representation learning. In ICCV (pp. 1863–1871).
Peng, D., Lei, Y., Hayat, M., Guo, Y., & Li, W. (2022). Semantic-aware domain generalized segmentation. In CVPR (pp. 2594–2605).
Peng, D., Lei, Y., Liu, L., Zhang, P., & Liu, J. (2021). Global and local texture randomization for synthetic-to-real semantic segmentation. IEEE Transactions on Image Processing, 30, 6594–6608.
Radenović, F., Iscen, A., Tolias, G., Avrithis, Y., & Chum, O. (2018). Revisiting oxford and paris: Large-scale image retrieval benchmarking. In CVPR (pp. 5706–5715).
Richter, S. R., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In ECCV (pp. 102–118). Springer.
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In CVPR (pp. 3234–3243).
Saito, K., Watanabe, K., Ushiku, Y., & Harada, T. (2018). Maximum classifier discrepancy for unsupervised domain adaptation. In CVPR (pp. 3723–3732).
Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., & Westling, P. (2014). High-resolution stereo datasets with subpixel-accurate ground truth. In German conference on pattern recognition (pp. 31–42). Springer.
Schops, T., Schonberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., & Geiger, A. (2017). A multi-view stereo benchmark with high-resolution images and multi-camera videos. In CVPR (pp. 3260–3269).
Shen, Z., Dai, Y., & Rao, Z. (2021). Cfnet: Cascade and fused cost volume for robust stereo matching. In CVPR (pp. 13906–13915).
Song, P., Guo, D., Yang, X., Tang, S., & Wang, M. (2024). Emotional video captioning with vision-based emotion interpretation network. IEEE Transactions on Image Processing.
Sun, C., Vianney, J. M. U., & Cao, D. (2019). Affordance learning in direct perception for autonomous driving. arXiv:1903.08746
Sun, X., Yao, Y., Wang, S., Li, H., & Zheng, L. (2023). Alice benchmarks: Connecting real world object re-identification with the synthetic. arXiv:2310.04416
Venkateswara, H., Eusebio, J., Chakraborty, S., & Panchanathan, S. (2017). Deep hashing network for unsupervised domain adaptation. In CVPR (pp. 5018–5027).
Wang, J., Lan, C., Liu, C., Ouyang, Y., Qin, T., Lu, W., Chen, Y., Zeng, W., & Yu, P. (2022a). Generalizing to unseen domains: A survey on domain generalization. IEEE Transactions on Knowledge and Data Engineering.
Wang, Y., Liao, S., & Shao, L. (2020). Surpassing real-world source training data: Random 3d characters for generalizable person re-identification. In ACM MM (pp. 3422–3430).
Wang, Z., Luo, Y., Qiu, R., Huang, Z., & Baktashmotlagh, M. (2021). Learning to diversify for single domain generalization. In ICCV (pp. 834–843).
Wang, R., Yi, M., Chen, Z., & Zhu, S. (2022b). Out-of-distribution generalization with causal invariant transformations. In CVPR (pp. 375–385).
Wei, L., Zhang, S., Gao, W., & Tian, Q. (2018). Person transfer gan to bridge domain gap for person re-identification. In CVPR (pp. 79–88).
Xie, C., Ye, H., Chen, F., Liu, Y., Sun, R., & Li, Z. (2020). Risk variance penalization. arXiv:2006.07544
Xu, Q., Zhang, R., Zhang, Y., Wang, Y., & Tian, Q. (2021). A fourier-based framework for domain generalization. In CVPR (pp. 14383–14392).
Yang, X., Feng, F., Ji, W., Wang, M., & Chua, T. S. (2021). Deconfounded video moment retrieval with causal intervention. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.
Yang, G., Song, X., Huang, C., Deng, Z., Shi, J., & Zhou, B. (2019). Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 899–908).
Yan, C., Gong, B., Wei, Y., & Gao, Y. (2020). Deep multi-view enhancement hashing for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(4), 1445–1451.
Yang, X., Wang, S., Dong, J., Dong, J., Wang, M., & Chua, T. S. (2022). Video moment retrieval with cross-modal neural architecture search. IEEE Transactions on Image Processing, 31, 1204–1216.
Yang, X., Zhou, P., & Wang, M. (2018). Person reidentification via structural deep metric learning. IEEE Transactions on Neural Networks and Learning Systems, 30(10), 2987–2998.
Yang, X., Zhou, P., & Wang, M. (2019). Person reidentification via structural deep metric learning. IEEE Transactions on Neural Networks and Learning Systems, 30(10), 2987–2998.
Yan, C., Pang, G., Bai, X., Liu, C., Ning, X., Gu, L., & Zhou, J. (2021). Beyond triplet loss: Person re-identification with fine-grained difference-aware pairwise loss. IEEE Transactions on Multimedia, 24, 1665–1677.
Yao, C., Jia, Y., Di, H., Li, P., & Wu, Y. (2021). A decomposition model for stereo matching. In CVPR (pp. 6091–6100).
Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., & Darrell, T. (2020). Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In CVPR (pp. 2636–2645).
Yu, Y., Khadivi, S., & Xu, J. (2022). Can data diversity enhance learning generalization? In Proceedings of the 29th International Conference on Computational Linguistics (pp. 4933–4945).
Yue, X., Zhang, Y., Zhao, S., Sangiovanni-Vincentelli, A., Keutzer, K., & Gong, B. (2019). Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data. In ICCV (pp. 2100–2110).
Zbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In CVPR (pp. 1592–1599).
Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2018). mixup: Beyond empirical risk minimization. In International Conference on Learning Representations.
Zhang, Y., Deng, B., Li, R., Jia, K., & Zhang, L. (2023). Adversarial style augmentation for domain generalization. arXiv:2301.12643
Zhang, P., Dou, H., Yu, Y., & Li, X. (2022b). Adaptive cross-domain learning for generalizable person re-identification. In ECCV (pp. 215–232). Springer.
Zhang, Y., Li, M., Li, R., Jia, K., & Zhang, L. (2022c). Exact feature distribution matching for arbitrary style transfer and domain generalization. In CVPR (pp. 8035–8045).
Zhang, F., Prisacariu, V., Yang, R., & Torr, P.H. (2019). Ga-net: Guided aggregation net for end-to-end stereo matching. In CVPR (pp. 185–194).
Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., & Torr, P. (2020). Domain-invariant stereo matching networks. In ECCV (pp. 420–439). Springer.
Zhang, A., Ren, W., Liu, Y., & Cao, X. (2023). Lightweight image super-resolution with superpixel token interaction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 12728–12737).
Zhang, J., Wang, X., Bai, X., Wang, C., Huang, L., Chen, Y., Gu, L., Zhou, J., Harada, T., & Hancock, E. R. (2022a). Revisiting domain generalized stereo matching networks from a feature consistency perspective. In CVPR (pp. 13001–13011).
Zhang, F., & Wah, B. W. (2017). Fundamental principles on learning new features for effective dense matching. IEEE Transactions on Image Processing, 27(2), 822–836.
Zhao, Y., Zhong, Z., Yang, F., Luo, Z., Lin, Y., Li, S., & Sebe, N. (2021). Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification. In CVPR (pp. 6277–6286).
Zhao, Y., Zhong, Z., Zhao, N., Sebe, N., & Lee, G.H. (2022). Style-hallucinated dual consistency learning for domain generalized semantic segmentation. In ECCV (pp. 535–552). Springer.
Zhao, Y., Zhong, Z., Zhao, N., Sebe, N., & Lee, G. H. (2024). Style-hallucinated dual consistency learning: A unified framework for visual domain generalization. International Journal of Computer Vision, 132(3), 837–853.
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV (pp. 1116–1124).
Zhong, Z., Zheng, L., Cao, D., & Li, S. (2017). Re-ranking person re-identification with k-reciprocal encoding. In CVPR (pp. 1318–1327).
Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2020a). Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 13001–13008).
Zhong, Z., Zhao, Y., Lee, G. H., & Sebe, N. (2022). Adversarial style augmentation for domain generalized urban-scene segmentation. NeurIPS, 35, 338–350.
Zhong, Z., Zheng, L., Luo, Z., Li, S., & Yang, Y. (2020b). Learning to adapt invariance in memory for person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(8), 2723–2738.
Zhong, Z., Zheng, L., Zheng, Z., Li, S., & Yang, Y. (2018). Camstyle: A novel data augmentation method for person re-identification. IEEE Transactions on Image Processing, 28(3), 1176–1190.
Zhou, S., Guo, D., Li, J., Yang, X., & Wang, M. (2023). Exploring sparse spatial relation in graph inference for text-based vqa. IEEE Transactions on Image Processing.
Zhou, K., Yang, Y., Hospedales, T., & Xiang, T. (2020). Learning to generate novel domains for domain generalization. In ECCV (pp. 561–578). Springer.
Zhou, K., Yang, Y., Qiao, Y., & Xiang, T. (2021b). Domain generalization with mixstyle. arXiv:2104.02008
Zhou, S., Guo, D., Yang, X., Dong, J., & Wang, M. (2024). Graph pooling inference network for text-based vqa. ACM Transactions on Multimedia Computing, Communications, and Applications, 20(4), 1–21.
Zhou, K., Yang, Y., Cavallaro, A., & Xiang, T. (2021a). Learning generalisable omni-scale representations for person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5056–5069.
Zhuang, Z., Wei, L., Xie, L., Zhang, T., Zhang, H., Wu, H., Ai, H., & Tian, Q. (2020). Rethinking the distribution gap of person re-identification with camera-based batch normalization. In ECCV (pp. 140–157). Springer.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (NSFC) under Grant U22A2094, Grant 62272435, and Grant 72188101.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Zhun Zhong.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, X., Chang, T., Zhang, T. et al. Learning Hierarchical Visual Transformation for Domain Generalizable Visual Matching and Recognition. Int J Comput Vis 132, 4823–4849 (2024). https://doi.org/10.1007/s11263-024-02106-7
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02106-7