Abstract
As a vital problem in pattern analysis and machine intelligence, Unsupervised Domain Adaptation (UDA) attempts to transfer an effective feature learner from a labeled source domain to an unlabeled target domain. Inspired by the success of the Transformer, several advances in UDA are achieved by adopting pure transformers as network architectures, but such a simple application can only capture patch-level information and lacks interpretability. To address these issues, we propose the Domain-Transformer (DoT) with domain-level attention mechanism to capture the long-range correspondence between the cross-domain samples. On the theoretical side, we provide a mathematical understanding of DoT: (1) We connect the domain-level attention with optimal transport theory, which provides interpretability from Wasserstein geometry; (2) From the perspective of learning theory, Wasserstein distance-based generalization bounds are derived, which explains the effectiveness of DoT for knowledge transfer. On the methodological side, DoT integrates the domain-level attention and manifold structure regularization, which characterize the sample-level information and locality consistency for cross-domain cluster structures. Besides, the domain-level attention mechanism can be used as a plug-and-play module, so DoT can be implemented under different neural network architectures. Instead of explicitly modeling the distribution discrepancy at domain-level or class-level, DoT learns transferable features under the guidance of long-range correspondence, so it is free of pseudo-labels and explicit domain discrepancy optimization. Extensive experiment results on several benchmark datasets validate the effectiveness of DoT.
Similar content being viewed by others
Data availability
The datasets adopted can be requested and downloaded through the following links: ImageCLEF https://www.imageclef.org/2014/adaptation, Office-31 https://faculty.cc.gatech.edu/~judy/domainadapt/#datasets_code, Office-Home https://www.hemanthdv.org/officeHomeDataset.html, VisDA-2017 https://ai.bu.edu/visda-2017/, DomainNet https://ai.bu.edu/M3SDA/#dataset. The code will be publicly open upon acceptance.
References
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79(1–2), 151–175.
Bhushan Damodaran, B., Kellenberger, B., & Flamary, R., Tuia, D., & Courty, N.(2018). DeepJDOT: Deep joint distribution optimal transport for unsupervised domain adaptation. In ECCV (pp. 447–463).
Caputo, B., Müller, H., & Martinez-Gomez, J., Villegas, M., Acar, B., Patricia, N., Marvasti, N., Üsküdarlı, S., Paredes, R., Cazorla, M., & Garcia-Varea, I. (2014). Imageclef 2014: Overview and analysis of the results. In International conference of the cross-language evaluation forum for European languages (pp. 192–211).
Courty, N., Flamary, R., Tuia, D., & Rakotomamonjy, A. (2016). Optimal transport for domain adaptation. TPAMI, 39(9), 1853–1865.
Courty, N., Flamary, R., & Habrard, A., & Rakotomamonjy, A. (2017). Joint distribution optimal transportation for domain adaptation. In: NeurIPS, pp 3733–3742
Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. In NeurIPS (pp. 2292–2300).
Dosovitskiy, A., Beyer, L., & Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., March, M., & Lempitsky, V. (2016). Domain-adversarial training of neural networks. JMLR, 17(59), 1–35.
Gong, B., Shi, Y., Sha, F., & Grauman, K. (2012). Geodesic flow kernel for unsupervised domain adaptation. In CVPR (pp. 2066–2073).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
He. X., & Niyogi, P. (2003). Locality preserving projections. In NeurIPS (pp. 153–160).
Knight, P. A. (2008). The Sinkhorn-Knopp algorithm: Convergence and applications. SIAM Journal on Matrix Analysis and Applications, 30, 261–275.
Kumar, V., Patil, H., Lal, R., & Chakraborty, A. (2023). Improving domain adaptation through class aware frequency transformation. In IJCV (pp. 1–20).
Lai, Z., Vesdapunt, N., Zhou, N., Wu, J., Huynh, C. P., Li, X., Fu, K. K., & Chuah, C. N. (2023). PADCLIP: Pseudo-labeling with adaptive debiasing in clip for unsupervised domain adaptation. In ICCV (pp. 16109–16119).
Li, J., Chen, E., Ding, Z., Zhu, L., Lu, K., & Shen, H. T. (2021). Maximum density divergence for domain adaptation. TPAMI, 43, 3918–3930.
Li, M.X., Zhai, Y.M., Luo, Y.W., Ge, P. F., & Ren, C. X. (2020). Enhanced transport distance for unsupervised domain adaptation. In CVPR (pp. 13936–13944).
Li, S., Xie, M., Lv, F., Liu, C. H., Liang, J., Qin, C., & Li, W. (2021b). Semantic concentration for domain adaptation. In ICCV (pp. 9082–9091).
Li, H., Wang, Z., Xu, Y.,Yang, Y., Mei, S., & Zhang, Z. (2024). MemoNav: Working memory model for visual navigation. In CVPR (pp. 17913–17922).
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV (pp. 10012–10022).
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR (pp. 3431–3440).
Long, M., Cao, Y., Wang, J., & Jordan, M. I. (2017). Deep transfer learning with joint adaptation networks. In ICML (pp. 2208–2217).
Long, M., Cao, Z., Wang, J., & Jordan, M. I. (2018). Conditional adversarial domain adaptation. In NeurIPS (pp. 1640–1650).
Long, M., Cao, Y., Cao, Z., Wang, J., & Jordan, M. I. (2019). Transferable representation learning with deep adaptation networks. TPAMI, 41(12), 3071–3085.
Luo, Y. W., Ren, C. X., Dai, D. Q., & Yan, H. (2022). Unsupervised domain adaptation via discriminative manifold propagation. TPAMI, 44(3), 1653–1669.
Ma, W., Zhang, J., Li, S., Liu, C. H., Wang, Y., & Li, W. (2021). Exploiting both domain-specific and invariant knowledge via a win-win transformer for unsupervised domain adaptation. Preprint at arXiv:2111.12941
Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. JMLR, 9(NOV), 2579–2605.
Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. TKDE, 22(10), 1345–1359.
Pan, Y., Yao, T., Li, Y., Wang, Y., Ngo, C. W., & Mei, T. (2019). Transferrable prototypical networks for unsupervised domain adaptation. In CVPR (pp. 2239–2247).
Pan, S. J., Tsang, I. W., Kwok, J. T., & Yang, Q. (2010). Domain adaptation via transfer component analysis. TNN, 22(2), 199–210.
Peng, X., Usman, B., Kaushik, N., Hoffman, J., Wang, D., & Saenko, K. (2017). Visda: The visual domain adaptation challenge. Preprint at arXiv:1710.06924
Peyré, G., & Cuturi, M. (2019). Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5–6), 355–607.
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., & Wang, B. (2019). Moment matching for multi-source domain adaptation. In ICCV (pp. 1406–1415).
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., & Krueger, G. (2021). Learning transferable visual models from natural language supervision. In ICML (pp. 8748–8763).
Rangwani, H., Aithal, S.K., Mishra, M., Jain, A., & Radhakrishnan, V. B. (2022). A closer look at smoothness in domain adversarial training. In ICML (pp. 18378–18399).
Ren, C. X., Liang, B. H., Ge, P., et al. (2020). Domain adaptive person re-identification via camera style generation and label propagation. IEEE Transactions on Information Forensics and Security, 15, 1290–1302.
Ren, C. X., Luo, Y. W., & Dai, D. Q. (2023). BuresNet: Conditional bures metric for transferable representation learning. TPAMI, 45(4), 4198–4213.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., & Berg, A. C. (2015). Imagenet large scale visual recognition challenge. IJCV, 115(3), 211–252.
Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010). Adapting visual category models to new domains. In ECCV (pp. 213–226).
Santambrogio, F. (2015). Optimal transport for applied mathematicians. Birkäuser, NY, 55(58–63), 94.
Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Schölkopf, B., & Lanckriet, G. R. (2009). On integral probability metrics, \(\phi \)-divergences and binary classification. arXiv:0901.2698
Sun, B., Feng, J., & Saenko, K. (2016). Return of frustratingly easy domain adaptation. In AAAI (pp. 2058–2065).
Sun, T., Lu, C., Zhang, T., & Ling, H. (2022). Safe self-refinement for transformer-based domain adaptation. In CVPR (pp. 7181–7190).
Tachet des Combes, R., Zhao, H., & Wang, Y.X., & Gordon, G. J. (2020). Domain adaptation with conditional distribution matching and generalized label shift. In NeurIPS (pp. 19276–19289).
Touvron, H., Cord, M., & Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. In ICML (pp. 10347–10357).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In NeurIPS (pp. 5998–6008).
Venkateswara, H., Eusebio, J., Chakraborty, S., & Panchanathan, S. (2017). Deep hashing network for unsupervised domain adaptation. In CVPR (pp. 5018–5027).
Villani, C. (2009). Optimal transport: Old and new. Springer.
Wang, Q., He, X., Jiang, X., et al. (2022). Robust bi-stochastic graph regularized matrix factorization for data clustering. TPAMI, 44(1), 390–403.
Wang, X., Li, L., Ye, W., Long, M., & Wang, J. (2019). Transferable attention for domain adaptation. In AAAI (pp. 5345–5352).
Xia, H., & Ding, Z. (2020). Structure preserving generative cross-domain learning. In CVPR (pp. 4364–4373).
Xu, R., Li, G., Yang, J., & Lin, L. (2019). Larger norm more transferable: An adaptive feature norm approach for unsupervised domain adaptation. In ICCV (pp. 1426–1435).
Xu, R., Liu, P., Wang, L., Chen, C., & Wang, J. (2020). Reliable weighted optimal transport for unsupervised domain adaptation. In CVPR
Xu, T., Chen, W., Wang, P., Li, H., & Jin, R. (2022). CDTrans: Cross-domain transformer for unsupervised domain adaptation. In ICLR.
Yang, J., Liu, J., Xu, N., & Huang, J. (2023). Tvt: Transferable vision transformer for unsupervised domain adaptation. In WACV (pp. 520–530).
Yang, X., Deng, C., Liu, T., et al. (2022). Heterogeneous graph attention network for unsupervised multiple-target domain adaptation. TPAMI, 44(4), 1992–2003.
Zaheer, M., Guruganesh, G., Dubey, K.A., Ainslie, J., Alberti, C., Ontanon, S., Pham, P., Ravula, A., Wang, Q., Yang, L., & Ahmed, A. (2020). Big bird: Transformers for longer sequences. In NeurIPS (pp. 17283–17297).
Zhang, K., Schölkopf, B., Muandet, K., & Wang, Z. (2013). Domain adaptation under target and conditional shift. In ICML (pp. 819–827).
Zhang, M., Li, W., & Du, Q. (2018). Diverse region-based CNN for hyperspectral image classification. TIP, 27(6), 2623–2634.
Zhang, Y., Liu, T., Long, M., & Jordan, M. (2019a). Bridging theory and algorithm for domain adaptation. In ICML (pp. 7404–7413).
Zhang, Z., Wang, M., & Nehorai, A. (2019). Optimal transport in reproducing kernel Hilbert spaces: Theory and applications. TPAMI, 42(7), 1741–1754.
Zhao, X., Huang, L., Nie, J., & Wei, Z. (2023). Towards adaptive multi-scale intermediate domain via progressive training for unsupervised domain adaptation. IEEE Transactions on Multimedia pp. 1–11.
Zhu, J., Bai, H., & Wang, L. (2023). Patch-mix transformer for unsupervised domain adaptation: A game perspective. In CVPR (pp. 3561–3571).
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2021a). Deformable detr: Deformable transformers for end-to-end object detection. In ICLR.
Zhu, Y., Zhuang, F., Wang, J., Ke, G., Chen, J., Bian, J., Xiong, H., & He, Q. (2021). Deep subdomain adaptation network for image classification. TNNLS, 32(4), 1713–1722.
Acknowledgements
This work is supported in part by National Natural Science Foundation of China (Grant No. 62376291), in part by Guangdong Basic and Applied Basic Research Foundation (2023B1515020004), in part by Science and Technology Program of Guangzhou (2024A04J6413), in part by the Fundamental Research Funds for the Central Universities, Sun Yat-sen University (24xkjc013), in part by Guangdong Province Key Laboratory of Computational Science at Sun Yat-sen University (2020B1212060032), in part by Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, and in part by the Hong Kong Innovation and Technology Commission (InnoHK Project CIMDA).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Wanli Ouyang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ren, CX., Zhai, Y., Luo, YW. et al. Towards Unsupervised Domain Adaptation via Domain-Transformer. Int J Comput Vis 132, 6163–6183 (2024). https://doi.org/10.1007/s11263-024-02174-9
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02174-9