这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

An Empirical Study on Multi-domain Robust Semantic Segmentation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

How to effectively leverage the plentiful existing datasets to train a robust and high-performance model is of great significance for many practical applications. However, a model trained on a naive merge of different datasets tends to obtain poor performance due to annotation conflicts and domain divergence. In this paper, we attempt to train a unified model that is expected to perform well across domains on several popularity segmentation datasets. We conduct a comprehensive analysis to assess the impact of various training schemes and model selection on multi-domain learning with extensive experiments. Based on the analysis, we propose a robust solution that consistently enhances the model performance across different domains. Our solution ranks 2nd on RVC 2022 semantic segmentation task, with a dataset only 1/3 size of the 1st model used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. All datasets used in our study are publicly available.

References

  • Bevandić, P., Oršić, M., Grubišić, I., Šarić, J., & Šegvić, S. (2022). Multi-domain semantic segmentation with overlapping labels. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 2615–2624).

  • Lambert, J., Liu, Z., Sener, O., Hays, J., & Koltun, V. (2020). Mseg: A composite dataset for multi-domain semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2879–2888).

  • Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., & Lempitsky, V. S. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17, 59–15935.

    MathSciNet  Google Scholar 

  • Isola, P., Zhu, J., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017 (pp. 5967–5976).

  • Li, H., Pan, S. J., Wang, S., & Kot, A. C. (2018). Domain generalization with adversarial feature learning. In 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018 (pp. 5400–5409).

  • Gong, R., Li, W., Chen, Y., & Gool, L. V. (2019). DLOW: Domain flow for adaptation and generalization. In IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019 (pp. 2477–2486).

  • Li, Y., Tian, X., Gong, M., Liu, Y., Liu, T., Zhang, K., & Tao, D. (2018). Deep domain generalization via conditional invariant adversarial networks. In Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (Eds.) Computer vision—ECCV 2018—15th European conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XV. Lecture Notes in Computer Science (Vol. 11219, pp. 647–663).

  • Shao, R., Lan, X., Li, J., & Yuen, P. C. (2019). Multi-adversarial discriminative deep domain generalization for face presentation attack detection. In IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019 (pp. 10023–10031).

  • Wang, J., Chen, Y., Feng, W., Yu, H., Huang, M., & Yang, Q. (2020). Transfer learning with dynamic distribution adaptation. ACM Transactions on Intelligent Systems and Technology, 11(1), 6–1625.

    Article  Google Scholar 

  • Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., & Darrell, T. (2014). Deep domain confusion: Maximizing for domain invariance. CoRR arXiv:1412.3474

  • Wang, J., Feng, W., Chen, Y., Yu, H., Huang, M., & Yu, P. S. (2018). Visual domain adaptation with manifold embedded distribution alignment. In 2018 ACM multimedia conference on multimedia conference, MM 2018, Seoul, Republic of Korea, October 22–26, 2018 (pp. 402–410).

  • Varma, G., Subramanian, A., Namboodiri, A.M., Chandraker, M., & Jawahar, C. V. (2019). IDD: A dataset for exploring problems of autonomous navigation in unconstrained environments. In IEEE Winter Conference on Applications of Computer Vision, WACV 2019, Waikoloa Village, HI, USA, January 7–11, 2019 (pp. 1743–1751).

  • Berriel, R. F., Lathuilière, S., Nabi, M., Klein, T., Oliveira-Santos, T., Sebe, N., & Ricci, E. (2019). Budget-aware adapters for multi-domain learning. In 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019 (pp. 382–391).

  • Xiao, J., Gu, S., & Zhang, L. (2020). Multi-domain learning for accurate and few-shot color constancy. In 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020 (pp. 3255–3264).

  • Rebuffi, S., Bilen, H., & Vedaldi, A. (2017). Learning multiple visual domains with residual adapters. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., Garnett, R. (Eds.) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA (pp. 506–516).

  • Rosenfeld, A., & Tsotsos, J. K. (2020). Incremental learning through deep adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(3), 651–663.

    Article  Google Scholar 

  • Bilen, H., & Vedaldi, A. (2017). Universal representations: The missing link between faces, text, planktons, and cat breeds. CoRR arXiv:1701.07275

  • Ramamonjison, R., Banitalebi-Dehkordi, A., Kang, X., Bai, X., & Zhang, Y. (2021). Simrod: A simple adaptation method for robust object detection. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021 (pp. 3550–3559).

  • Yun, S., Han, D., Chun, S., Oh, S. J., Yoo, Y., & Choe, J. (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. In 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019 (pp. 6022–6031).

  • Rao, Z., Dai, Y., Shen, Z., & He, R. (2022). Rethinking training strategy in stereo matching. IEEE Transactions on Neural Networks and Learning Systems, 34, 7796–7809.

    Article  Google Scholar 

  • Hu, H., Wei, F., Hu, H., Ye, Q., Cui, J., & Wang, L. (2021). Semi-supervised semantic segmentation via adaptive equalization learning. In Annual conference on neural information processing systems 2021, NeurIPS 2021, December 6–14, 2021, virtual (pp. 22106–22118).

  • Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T., Cubuk, E. D., Le, Q. V., & Zoph, B. (2021). Simple copy-paste is a strong data augmentation method for instance segmentation. In IEEE conference on computer vision and pattern recognition, CVPR 2021, Virtual, June 19–25, 2021 (pp. 2918–2928).

  • Ros, G., Stent, S., Alcantarilla, P. F., & Watanabe, T. (2016). Training constrained deconvolutional networks for road scene semantic segmentation. CoRR arXiv:1604.01545

  • Li, Y., & Vasconcelos, N. (2019). Efficient multi-domain learning by covariance normalization. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 5419–5428).

  • Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2006). Analysis of representations for domain adaptation. In: Schölkopf, B., Platt, J.C., Hofmann, T. (Eds.) Advances in neural information processing systems 19, proceedings of the twentieth annual conference on neural information processing systems, Vancouver, British Columbia, Canada, December 4–7, 2006 (pp. 137–144).

  • Wang, S., Yu, L., Yang, X., Fu, C., & Heng, P. (2019). Patch-based output space adversarial learning for joint optic disc and cup segmentation. IEEE Transactions on Medical Imaging, 38(11), 2485–2495.

    Article  Google Scholar 

  • Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Paz, D. (2019). Invariant risk minimization. arXiv:1907.02893

  • Ahuja, K., Shanmugam, K., Varshney, K. R., & Dhurandhar, A. (2020). Invariant risk minimization games. In Proceedings of the 37th international conference on machine learning, ICML 2020, 13–18 July 2020, Virtual Event. Proceedings of machine learning research (Vol. 119, pp. 145–155).

  • Krueger, D., Caballero, E., Jacobsen, J., Zhang, A., Binas, J., Zhang, D., Priol, R. L., & Courville, A. C. (2021). Out-of-distribution generalization via risk extrapolation (rex). In: Proceedings of the 38th international conference on machine learning, ICML 2021, 18–24 July 2021, virtual event. proceedings of machine learning research (Vol. 139, pp. 5815–5826).

  • Li, B., Shen, Y., Wang, Y., Zhu, W., Reed, C., Li, D., Keutzer, K., & Zhao, H. (2022). Invariant information bottleneck for domain generalization. In Thirty-sixth AAAI conference on artificial intelligence, AAAI 2022, thirty-fourth conference on innovative applications of artificial intelligence, IAAI 2022, the twelveth symposium on educational advances in artificial intelligence, EAAI 2022 virtual event, February 22–March 1, 2022 (pp. 7399–7407).

  • Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision, 127(3), 302–321.

    Article  Google Scholar 

  • Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition (pp. 3485–3492).

  • Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Advances in neural information processing systems (Vol. 27).

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213–3223).

  • Neuhold, G., Ollmann, T., Rota Bulo, S., & Kontschieder, P. (2017). The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE international conference on computer vision (pp. 4990–4999).

  • Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017). Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5828–5839).

  • Richter, S. R., Hayder, Z., & Koltun, V. (2017). Playing for benchmarks. In Proceedings of the IEEE international conference on computer vision (pp. 2213–2222).

  • Zendel, O., Honauer, K., Murschitz, M., Steininger, D., & Dominguez, G. F. (2018). Wilddash-creating hazard-aware benchmarks. In Proceedings of the European conference on computer vision (ECCV) (pp. 402–416).

  • Bevandić, P., Oršić, M., Grubišić, I., Šarić, J., & Šegvić, S. (2020). Multi-domain semantic segmentation with pyramidal fusion. arXiv preprint arXiv:2009.01636

  • Bevandic, P., & Segvic, S. (2022). Automatic universal taxonomies for multi-domain semantic segmentation. In 33rd British machine vision conference 2022, BMVC 2022, London, UK, November 21–24, 2022 (p. 63)

  • Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). Segformer: Simple and efficient design for semantic segmentation with transformers. In Advances in neural information processing systems 34: annual conference on neural information processing systems 2021, NeurIPS 2021, December 6–14, 2021, Virtual (pp. 12077–12090).

  • Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., & He, K. (2017). Accurate, large minibatch sgd: Training imagenet in 1 hour. CoRR arXiv:1706.02677

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016 (pp. 770–778).

  • Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801–818).

  • Bai, Y., Mei, J., Yuille, A. L., & Xie, C. (2021). Are transformers more robust than CNNs? In Advances in neural information processing systems 34: Annual conference on neural information processing systems 2021, NeurIPS 2021, December 6–14, 2021, Virtual (pp. 26831–26843).

  • Bhojanapalli, S., Chakrabarti, A., Glasner, D., Li, D., Unterthiner, T., & Veit, A. (2021). Understanding robustness of transformers for image classification. In 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021 (pp. 10211–10221).

  • Lin, T., Maire, M., Belongie, S. J., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In: Fleet, D. J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer vision—ECCV 2014—13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V. Lecture Notes in Computer Science (Vol. 8693, pp. 740–755).

  • Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V., & Darrell, T. (2018). BDD100K: A diverse driving video database with scalable annotation tooling. CoRR arXiv:1805.04687

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingjie Liu.

Additional information

Communicated by Oliver Zendel.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Ge, P., Liu, Q. et al. An Empirical Study on Multi-domain Robust Semantic Segmentation. Int J Comput Vis 132, 4289–4304 (2024). https://doi.org/10.1007/s11263-024-02100-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-024-02100-z

Keywords