An Empirical Study on Multi-domain Robust Semantic Segmentation

Liu, Yajie; Ge, Pu; Liu, Qingjie; Fan, Shichao; Wang, Yunhong

doi:10.1007/s11263-024-02100-z

An Empirical Study on Multi-domain Robust Semantic Segmentation

Published: 10 May 2024

Volume 132, pages 4289–4304, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Yajie Liu¹,
Pu Ge²^na1,
Qingjie Liu ORCID: orcid.org/0000-0002-5181-6451¹,
Shichao Fan¹ &
…
Yunhong Wang¹

458 Accesses
1 Citation
Explore all metrics

Abstract

How to effectively leverage the plentiful existing datasets to train a robust and high-performance model is of great significance for many practical applications. However, a model trained on a naive merge of different datasets tends to obtain poor performance due to annotation conflicts and domain divergence. In this paper, we attempt to train a unified model that is expected to perform well across domains on several popularity segmentation datasets. We conduct a comprehensive analysis to assess the impact of various training schemes and model selection on multi-domain learning with extensive experiments. Based on the analysis, we propose a robust solution that consistently enhances the model performance across different domains. Our solution ranks 2nd on RVC 2022 semantic segmentation task, with a dataset only 1/3 size of the 1st model used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic Segmentation via Multi-task, Multi-domain Learning

Scaling Up Multi-domain Semantic Segmentation with Sentence Embeddings

Article 01 May 2024

Union-Set Multi-source Model Adaptation for Semantic Segmentation

Notes

All datasets used in our study are publicly available.

References

Bevandić, P., Oršić, M., Grubišić, I., Šarić, J., & Šegvić, S. (2022). Multi-domain semantic segmentation with overlapping labels. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 2615–2624).
Lambert, J., Liu, Z., Sener, O., Hays, J., & Koltun, V. (2020). Mseg: A composite dataset for multi-domain semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2879–2888).
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., & Lempitsky, V. S. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17, 59–15935.
MathSciNet Google Scholar
Isola, P., Zhu, J., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017 (pp. 5967–5976).
Li, H., Pan, S. J., Wang, S., & Kot, A. C. (2018). Domain generalization with adversarial feature learning. In 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018 (pp. 5400–5409).
Gong, R., Li, W., Chen, Y., & Gool, L. V. (2019). DLOW: Domain flow for adaptation and generalization. In IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019 (pp. 2477–2486).
Li, Y., Tian, X., Gong, M., Liu, Y., Liu, T., Zhang, K., & Tao, D. (2018). Deep domain generalization via conditional invariant adversarial networks. In Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (Eds.) Computer vision—ECCV 2018—15th European conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XV. Lecture Notes in Computer Science (Vol. 11219, pp. 647–663).
Shao, R., Lan, X., Li, J., & Yuen, P. C. (2019). Multi-adversarial discriminative deep domain generalization for face presentation attack detection. In IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019 (pp. 10023–10031).
Wang, J., Chen, Y., Feng, W., Yu, H., Huang, M., & Yang, Q. (2020). Transfer learning with dynamic distribution adaptation. ACM Transactions on Intelligent Systems and Technology, 11(1), 6–1625.
Article Google Scholar
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., & Darrell, T. (2014). Deep domain confusion: Maximizing for domain invariance. CoRR arXiv:1412.3474
Wang, J., Feng, W., Chen, Y., Yu, H., Huang, M., & Yu, P. S. (2018). Visual domain adaptation with manifold embedded distribution alignment. In 2018 ACM multimedia conference on multimedia conference, MM 2018, Seoul, Republic of Korea, October 22–26, 2018 (pp. 402–410).
Varma, G., Subramanian, A., Namboodiri, A.M., Chandraker, M., & Jawahar, C. V. (2019). IDD: A dataset for exploring problems of autonomous navigation in unconstrained environments. In IEEE Winter Conference on Applications of Computer Vision, WACV 2019, Waikoloa Village, HI, USA, January 7–11, 2019 (pp. 1743–1751).
Berriel, R. F., Lathuilière, S., Nabi, M., Klein, T., Oliveira-Santos, T., Sebe, N., & Ricci, E. (2019). Budget-aware adapters for multi-domain learning. In 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019 (pp. 382–391).
Xiao, J., Gu, S., & Zhang, L. (2020). Multi-domain learning for accurate and few-shot color constancy. In 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020 (pp. 3255–3264).
Rebuffi, S., Bilen, H., & Vedaldi, A. (2017). Learning multiple visual domains with residual adapters. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., Garnett, R. (Eds.) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA (pp. 506–516).
Rosenfeld, A., & Tsotsos, J. K. (2020). Incremental learning through deep adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(3), 651–663.
Article Google Scholar
Bilen, H., & Vedaldi, A. (2017). Universal representations: The missing link between faces, text, planktons, and cat breeds. CoRR arXiv:1701.07275
Ramamonjison, R., Banitalebi-Dehkordi, A., Kang, X., Bai, X., & Zhang, Y. (2021). Simrod: A simple adaptation method for robust object detection. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021 (pp. 3550–3559).
Yun, S., Han, D., Chun, S., Oh, S. J., Yoo, Y., & Choe, J. (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. In 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019 (pp. 6022–6031).
Rao, Z., Dai, Y., Shen, Z., & He, R. (2022). Rethinking training strategy in stereo matching. IEEE Transactions on Neural Networks and Learning Systems, 34, 7796–7809.
Article Google Scholar
Hu, H., Wei, F., Hu, H., Ye, Q., Cui, J., & Wang, L. (2021). Semi-supervised semantic segmentation via adaptive equalization learning. In Annual conference on neural information processing systems 2021, NeurIPS 2021, December 6–14, 2021, virtual (pp. 22106–22118).
Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T., Cubuk, E. D., Le, Q. V., & Zoph, B. (2021). Simple copy-paste is a strong data augmentation method for instance segmentation. In IEEE conference on computer vision and pattern recognition, CVPR 2021, Virtual, June 19–25, 2021 (pp. 2918–2928).
Ros, G., Stent, S., Alcantarilla, P. F., & Watanabe, T. (2016). Training constrained deconvolutional networks for road scene semantic segmentation. CoRR arXiv:1604.01545
Li, Y., & Vasconcelos, N. (2019). Efficient multi-domain learning by covariance normalization. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 5419–5428).
Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2006). Analysis of representations for domain adaptation. In: Schölkopf, B., Platt, J.C., Hofmann, T. (Eds.) Advances in neural information processing systems 19, proceedings of the twentieth annual conference on neural information processing systems, Vancouver, British Columbia, Canada, December 4–7, 2006 (pp. 137–144).
Wang, S., Yu, L., Yang, X., Fu, C., & Heng, P. (2019). Patch-based output space adversarial learning for joint optic disc and cup segmentation. IEEE Transactions on Medical Imaging, 38(11), 2485–2495.
Article Google Scholar
Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Paz, D. (2019). Invariant risk minimization. arXiv:1907.02893
Ahuja, K., Shanmugam, K., Varshney, K. R., & Dhurandhar, A. (2020). Invariant risk minimization games. In Proceedings of the 37th international conference on machine learning, ICML 2020, 13–18 July 2020, Virtual Event. Proceedings of machine learning research (Vol. 119, pp. 145–155).
Krueger, D., Caballero, E., Jacobsen, J., Zhang, A., Binas, J., Zhang, D., Priol, R. L., & Courville, A. C. (2021). Out-of-distribution generalization via risk extrapolation (rex). In: Proceedings of the 38th international conference on machine learning, ICML 2021, 18–24 July 2021, virtual event. proceedings of machine learning research (Vol. 139, pp. 5815–5826).
Li, B., Shen, Y., Wang, Y., Zhu, W., Reed, C., Li, D., Keutzer, K., & Zhao, H. (2022). Invariant information bottleneck for domain generalization. In Thirty-sixth AAAI conference on artificial intelligence, AAAI 2022, thirty-fourth conference on innovative applications of artificial intelligence, IAAI 2022, the twelveth symposium on educational advances in artificial intelligence, EAAI 2022 virtual event, February 22–March 1, 2022 (pp. 7399–7407).
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision, 127(3), 302–321.
Article Google Scholar
Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition (pp. 3485–3492).
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Advances in neural information processing systems (Vol. 27).
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213–3223).
Neuhold, G., Ollmann, T., Rota Bulo, S., & Kontschieder, P. (2017). The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE international conference on computer vision (pp. 4990–4999).
Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017). Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5828–5839).
Richter, S. R., Hayder, Z., & Koltun, V. (2017). Playing for benchmarks. In Proceedings of the IEEE international conference on computer vision (pp. 2213–2222).
Zendel, O., Honauer, K., Murschitz, M., Steininger, D., & Dominguez, G. F. (2018). Wilddash-creating hazard-aware benchmarks. In Proceedings of the European conference on computer vision (ECCV) (pp. 402–416).
Bevandić, P., Oršić, M., Grubišić, I., Šarić, J., & Šegvić, S. (2020). Multi-domain semantic segmentation with pyramidal fusion. arXiv preprint arXiv:2009.01636
Bevandic, P., & Segvic, S. (2022). Automatic universal taxonomies for multi-domain semantic segmentation. In 33rd British machine vision conference 2022, BMVC 2022, London, UK, November 21–24, 2022 (p. 63)
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). Segformer: Simple and efficient design for semantic segmentation with transformers. In Advances in neural information processing systems 34: annual conference on neural information processing systems 2021, NeurIPS 2021, December 6–14, 2021, Virtual (pp. 12077–12090).
Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., & He, K. (2017). Accurate, large minibatch sgd: Training imagenet in 1 hour. CoRR arXiv:1706.02677
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016 (pp. 770–778).
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801–818).
Bai, Y., Mei, J., Yuille, A. L., & Xie, C. (2021). Are transformers more robust than CNNs? In Advances in neural information processing systems 34: Annual conference on neural information processing systems 2021, NeurIPS 2021, December 6–14, 2021, Virtual (pp. 26831–26843).
Bhojanapalli, S., Chakrabarti, A., Glasner, D., Li, D., Unterthiner, T., & Veit, A. (2021). Understanding robustness of transformers for image classification. In 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021 (pp. 10211–10221).
Lin, T., Maire, M., Belongie, S. J., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In: Fleet, D. J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer vision—ECCV 2014—13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V. Lecture Notes in Computer Science (Vol. 8693, pp. 740–755).
Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V., & Darrell, T. (2018). BDD100K: A diverse driving video database with scalable annotation tooling. CoRR arXiv:1805.04687

Download references

Author information

Pu Ge has contributed equally to this work.

Authors and Affiliations

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
Yajie Liu, Qingjie Liu, Shichao Fan & Yunhong Wang
Hangzhou Innovation Institute, Beihang University, Hangzhou, China
Pu Ge

Authors

Yajie Liu
View author publications
Search author on:PubMed Google Scholar
Pu Ge
View author publications
Search author on:PubMed Google Scholar
Qingjie Liu
View author publications
Search author on:PubMed Google Scholar
Shichao Fan
View author publications
Search author on:PubMed Google Scholar
Yunhong Wang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Qingjie Liu.

Additional information

Communicated by Oliver Zendel.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Y., Ge, P., Liu, Q. et al. An Empirical Study on Multi-domain Robust Semantic Segmentation. Int J Comput Vis 132, 4289–4304 (2024). https://doi.org/10.1007/s11263-024-02100-z

Download citation

Received: 30 November 2022
Accepted: 22 April 2024
Published: 10 May 2024
Version of record: 10 May 2024
Issue date: October 2024
DOI: https://doi.org/10.1007/s11263-024-02100-z

Keywords

Part of a collection:

Special Issue on Robust Vision

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Empirical Study on Multi-domain Robust Semantic Segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Segmentation via Multi-task, Multi-domain Learning

Scaling Up Multi-domain Semantic Segmentation with Sentence Embeddings

Union-Set Multi-source Model Adaptation for Semantic Segmentation

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now