Abstract
How to effectively leverage the plentiful existing datasets to train a robust and high-performance model is of great significance for many practical applications. However, a model trained on a naive merge of different datasets tends to obtain poor performance due to annotation conflicts and domain divergence. In this paper, we attempt to train a unified model that is expected to perform well across domains on several popularity segmentation datasets. We conduct a comprehensive analysis to assess the impact of various training schemes and model selection on multi-domain learning with extensive experiments. Based on the analysis, we propose a robust solution that consistently enhances the model performance across different domains. Our solution ranks 2nd on RVC 2022 semantic segmentation task, with a dataset only 1/3 size of the 1st model used.
Similar content being viewed by others
Notes
All datasets used in our study are publicly available.
References
Bevandić, P., Oršić, M., Grubišić, I., Šarić, J., & Šegvić, S. (2022). Multi-domain semantic segmentation with overlapping labels. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 2615–2624).
Lambert, J., Liu, Z., Sener, O., Hays, J., & Koltun, V. (2020). Mseg: A composite dataset for multi-domain semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2879–2888).
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., & Lempitsky, V. S. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17, 59–15935.
Isola, P., Zhu, J., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017 (pp. 5967–5976).
Li, H., Pan, S. J., Wang, S., & Kot, A. C. (2018). Domain generalization with adversarial feature learning. In 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018 (pp. 5400–5409).
Gong, R., Li, W., Chen, Y., & Gool, L. V. (2019). DLOW: Domain flow for adaptation and generalization. In IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019 (pp. 2477–2486).
Li, Y., Tian, X., Gong, M., Liu, Y., Liu, T., Zhang, K., & Tao, D. (2018). Deep domain generalization via conditional invariant adversarial networks. In Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (Eds.) Computer vision—ECCV 2018—15th European conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XV. Lecture Notes in Computer Science (Vol. 11219, pp. 647–663).
Shao, R., Lan, X., Li, J., & Yuen, P. C. (2019). Multi-adversarial discriminative deep domain generalization for face presentation attack detection. In IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019 (pp. 10023–10031).
Wang, J., Chen, Y., Feng, W., Yu, H., Huang, M., & Yang, Q. (2020). Transfer learning with dynamic distribution adaptation. ACM Transactions on Intelligent Systems and Technology, 11(1), 6–1625.
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., & Darrell, T. (2014). Deep domain confusion: Maximizing for domain invariance. CoRR arXiv:1412.3474
Wang, J., Feng, W., Chen, Y., Yu, H., Huang, M., & Yu, P. S. (2018). Visual domain adaptation with manifold embedded distribution alignment. In 2018 ACM multimedia conference on multimedia conference, MM 2018, Seoul, Republic of Korea, October 22–26, 2018 (pp. 402–410).
Varma, G., Subramanian, A., Namboodiri, A.M., Chandraker, M., & Jawahar, C. V. (2019). IDD: A dataset for exploring problems of autonomous navigation in unconstrained environments. In IEEE Winter Conference on Applications of Computer Vision, WACV 2019, Waikoloa Village, HI, USA, January 7–11, 2019 (pp. 1743–1751).
Berriel, R. F., Lathuilière, S., Nabi, M., Klein, T., Oliveira-Santos, T., Sebe, N., & Ricci, E. (2019). Budget-aware adapters for multi-domain learning. In 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019 (pp. 382–391).
Xiao, J., Gu, S., & Zhang, L. (2020). Multi-domain learning for accurate and few-shot color constancy. In 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020 (pp. 3255–3264).
Rebuffi, S., Bilen, H., & Vedaldi, A. (2017). Learning multiple visual domains with residual adapters. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., Garnett, R. (Eds.) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA (pp. 506–516).
Rosenfeld, A., & Tsotsos, J. K. (2020). Incremental learning through deep adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(3), 651–663.
Bilen, H., & Vedaldi, A. (2017). Universal representations: The missing link between faces, text, planktons, and cat breeds. CoRR arXiv:1701.07275
Ramamonjison, R., Banitalebi-Dehkordi, A., Kang, X., Bai, X., & Zhang, Y. (2021). Simrod: A simple adaptation method for robust object detection. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021 (pp. 3550–3559).
Yun, S., Han, D., Chun, S., Oh, S. J., Yoo, Y., & Choe, J. (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. In 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019 (pp. 6022–6031).
Rao, Z., Dai, Y., Shen, Z., & He, R. (2022). Rethinking training strategy in stereo matching. IEEE Transactions on Neural Networks and Learning Systems, 34, 7796–7809.
Hu, H., Wei, F., Hu, H., Ye, Q., Cui, J., & Wang, L. (2021). Semi-supervised semantic segmentation via adaptive equalization learning. In Annual conference on neural information processing systems 2021, NeurIPS 2021, December 6–14, 2021, virtual (pp. 22106–22118).
Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T., Cubuk, E. D., Le, Q. V., & Zoph, B. (2021). Simple copy-paste is a strong data augmentation method for instance segmentation. In IEEE conference on computer vision and pattern recognition, CVPR 2021, Virtual, June 19–25, 2021 (pp. 2918–2928).
Ros, G., Stent, S., Alcantarilla, P. F., & Watanabe, T. (2016). Training constrained deconvolutional networks for road scene semantic segmentation. CoRR arXiv:1604.01545
Li, Y., & Vasconcelos, N. (2019). Efficient multi-domain learning by covariance normalization. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 5419–5428).
Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2006). Analysis of representations for domain adaptation. In: Schölkopf, B., Platt, J.C., Hofmann, T. (Eds.) Advances in neural information processing systems 19, proceedings of the twentieth annual conference on neural information processing systems, Vancouver, British Columbia, Canada, December 4–7, 2006 (pp. 137–144).
Wang, S., Yu, L., Yang, X., Fu, C., & Heng, P. (2019). Patch-based output space adversarial learning for joint optic disc and cup segmentation. IEEE Transactions on Medical Imaging, 38(11), 2485–2495.
Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Paz, D. (2019). Invariant risk minimization. arXiv:1907.02893
Ahuja, K., Shanmugam, K., Varshney, K. R., & Dhurandhar, A. (2020). Invariant risk minimization games. In Proceedings of the 37th international conference on machine learning, ICML 2020, 13–18 July 2020, Virtual Event. Proceedings of machine learning research (Vol. 119, pp. 145–155).
Krueger, D., Caballero, E., Jacobsen, J., Zhang, A., Binas, J., Zhang, D., Priol, R. L., & Courville, A. C. (2021). Out-of-distribution generalization via risk extrapolation (rex). In: Proceedings of the 38th international conference on machine learning, ICML 2021, 18–24 July 2021, virtual event. proceedings of machine learning research (Vol. 139, pp. 5815–5826).
Li, B., Shen, Y., Wang, Y., Zhu, W., Reed, C., Li, D., Keutzer, K., & Zhao, H. (2022). Invariant information bottleneck for domain generalization. In Thirty-sixth AAAI conference on artificial intelligence, AAAI 2022, thirty-fourth conference on innovative applications of artificial intelligence, IAAI 2022, the twelveth symposium on educational advances in artificial intelligence, EAAI 2022 virtual event, February 22–March 1, 2022 (pp. 7399–7407).
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision, 127(3), 302–321.
Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition (pp. 3485–3492).
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Advances in neural information processing systems (Vol. 27).
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213–3223).
Neuhold, G., Ollmann, T., Rota Bulo, S., & Kontschieder, P. (2017). The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE international conference on computer vision (pp. 4990–4999).
Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017). Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5828–5839).
Richter, S. R., Hayder, Z., & Koltun, V. (2017). Playing for benchmarks. In Proceedings of the IEEE international conference on computer vision (pp. 2213–2222).
Zendel, O., Honauer, K., Murschitz, M., Steininger, D., & Dominguez, G. F. (2018). Wilddash-creating hazard-aware benchmarks. In Proceedings of the European conference on computer vision (ECCV) (pp. 402–416).
Bevandić, P., Oršić, M., Grubišić, I., Šarić, J., & Šegvić, S. (2020). Multi-domain semantic segmentation with pyramidal fusion. arXiv preprint arXiv:2009.01636
Bevandic, P., & Segvic, S. (2022). Automatic universal taxonomies for multi-domain semantic segmentation. In 33rd British machine vision conference 2022, BMVC 2022, London, UK, November 21–24, 2022 (p. 63)
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). Segformer: Simple and efficient design for semantic segmentation with transformers. In Advances in neural information processing systems 34: annual conference on neural information processing systems 2021, NeurIPS 2021, December 6–14, 2021, Virtual (pp. 12077–12090).
Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., & He, K. (2017). Accurate, large minibatch sgd: Training imagenet in 1 hour. CoRR arXiv:1706.02677
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016 (pp. 770–778).
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801–818).
Bai, Y., Mei, J., Yuille, A. L., & Xie, C. (2021). Are transformers more robust than CNNs? In Advances in neural information processing systems 34: Annual conference on neural information processing systems 2021, NeurIPS 2021, December 6–14, 2021, Virtual (pp. 26831–26843).
Bhojanapalli, S., Chakrabarti, A., Glasner, D., Li, D., Unterthiner, T., & Veit, A. (2021). Understanding robustness of transformers for image classification. In 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021 (pp. 10211–10221).
Lin, T., Maire, M., Belongie, S. J., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In: Fleet, D. J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer vision—ECCV 2014—13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V. Lecture Notes in Computer Science (Vol. 8693, pp. 740–755).
Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V., & Darrell, T. (2018). BDD100K: A diverse driving video database with scalable annotation tooling. CoRR arXiv:1805.04687
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Oliver Zendel.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Y., Ge, P., Liu, Q. et al. An Empirical Study on Multi-domain Robust Semantic Segmentation. Int J Comput Vis 132, 4289–4304 (2024). https://doi.org/10.1007/s11263-024-02100-z
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02100-z