这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Deep supervised models have an unprecedented capacity to absorb large quantities of training data. Hence, training on multiple datasets becomes a method of choice towards strong generalization in usual scenes and graceful performance degradation in edge cases. Unfortunately, popular datasets often have discrepant granularities. For instance, the Cityscapes road class subsumes all driving surfaces, while Vistas defines separate classes for road markings, manholes etc. Furthermore, many datasets have overlapping labels. For instance, pickups are labeled as trucks in VIPER, cars in Vistas, and vans in ADE20k. We address this challenge by considering labels as unions of universal visual concepts. This allows seamless and principled learning on multi-domain dataset collections without requiring any relabeling effort. Our method improves within-dataset and cross-dataset generalization, and provides opportunity to learn visual concepts which are not separately labeled in any of the training datasets. Experiments reveal competitive or state-of-the-art performance on two multi-domain dataset collections and on the WildDash 2 benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data Availability Statement

We perform our experiments on the following publically available datasets: ADE20k Zhou et al. (2017), BDD Yu et al. (2018), Camvid Badrinarayanan et al. (2017), Cityscapes Cordts et al. (2016), COCO Lin et al. (2014), IDD Varma et al. (2019), KITTI Geiger et al. (2013), MSeg Lambert et al. (2020), SUN RGBD Song et al. (2015), Scannet Dai et al. (2017), Viper Richter et al. (2017), Vistas Neuhold et al. (2017), and WildDash 2 Zendel et al. (2018). Our universal taxonomy for these datasets is available online Bevandic (2022).

References

  • Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495.

    Article  Google Scholar 

  • Bevandic, P. (2022). Universal taxonomies for semantic segmentation (source code). Accessed 02 Dec 2022. https://github.com/UNIZG-FER-D307/universal_taxonomies.

  • Bevandic, P., & Segvic, S. (2022). Automatic universal taxonomies for multi-domain semantic segmentation. In: BMVC

  • Bevandić, P., Oršić, M., Grubišić, I., Šarić, J., & Šegvić, S. (2022). Multi-domain semantic segmentation with overlapping labels. In: Proceedings of the IEEE/CVF Winter conference on applications of computer vision (WACV), pp. 2615–2624.

  • Bevandić, P., Krešo, I., Oršić, M., & Šegvić, S. (2022). Dense open-set recognition based on training with noisy negative images. Image and Vision Computing, 124, 104490. https://doi.org/10.1016/j.imavis.2022.104490

    Article  Google Scholar 

  • Biase, G.D., Blum, H., Siegwart, R., & Cadena, C. (2021). Pixel-wise anomaly detection in complex driving scenes. In: Computer vision and pattern recognition, CVPR

  • Blum, H., Sarlin, P., Nieto, J. I., Siegwart, R., & Cadena, C. (2021). The fishyscapes benchmark: Measuring blind spots in semantic segmentation. International Journal of Computer Vision, 129(11), 3119–3135.

    Article  Google Scholar 

  • Chan, R., Lis, K., Uhlemeyer, S., Blum, H., Honari, S., Siegwart, R., et al. (2021). SegmentMeIfYouCan: A benchmark for anomaly segmentation. In: Vanschoren J, Yeung S, (Eds.) NeurIPS;

  • Chan, R., Rottmann, M., & Gottschalk, H. (2021). Entropy maximization and meta classification for out-of-distribution detection in semantic segmentation. In: International conference on computer vision, ICCV;

  • Cheng, B., Collins, M.D., Zhu, Y., Liu, T., Huang, T.S., Adam, H., et al. (2020). Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12475–12485.

  • Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., & Girdhar, R. (2022). Masked-attention mask transformer for universal image segmentation. In: CVPR; pp. 1280–1289.

  • Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R. (2022). Masked-attention mask transformer for universal image segmentation. In: CVPR

  • Cheng, B., Schwing, A.G., & Kirillov, A. (2021). Per-pixel classification is not all you need for semantic segmentation. In: NeurIPS

  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.

    Article  Google Scholar 

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., et al. (2016). The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3213–3223.

  • Cour, T., Sapp, B., & Taskar, B. (2011). Learning from partial labels. The Journal of Machine Learning Research., 12, 1501–1536.

    MathSciNet  Google Scholar 

  • Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017). ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. In: CVPR

  • Everingham, M., Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88, 303–338.

    Article  Google Scholar 

  • Fourure, D., Emonet, R., Fromont, E., Muselet, D., Neverova, N., Trémeau, A., et al. (2017). Multi-task, multi-domain learning: application to semantic segmentation and pose regression. Neurocomputing.

  • Galleguillos, C., & Belongie, S. (2010). Context based object categorization: A critical survey. Computer Vision and Image Understanding, 06(114), 712–722. https://doi.org/10.1016/j.cviu.2010.02.004

    Article  Google Scholar 

  • Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The KITTI dataset. Int J Robotics Res., 32(11), 1231–1237.

    Article  Google Scholar 

  • Gupta, A., Dollar, P., & Girshick, R. (2019). LVIS: A dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016).Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.

  • Huang, G., Liu, Z., Pleiss, G., Van Der Maaten, L., & Weinberger, K. (2019). Convolutional networks with dense connectivity. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Kalluri, T., Varma, G., Chandraker, M., & Jawahar, C. (2019). Universal semi-supervised semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp. 5259–5270.

  • Kim, D., Tsai, Y., Suh, Y., Faraki, M., Garg, S., Chandraker, M., et al. (2022). Learning semantic segmentation from multiple datasets with label shifts. CoRR. abs/2202.14030.

  • Kim, D., Tsai, Y., Suh, Y., Faraki, M., Garg, S., Chandraker, M., et al. (2022). learning semantic segmentation from multiple datasets with label shifts. In: ECCV

  • Krešo, I., Krapac, J., & Šegvić, S. (2020). Efficient ladder-style densenets for semantic segmentation of large images. IEEE Transactions on Intelligent Transportation Systems.

  • Kreso, I., Krapac, J., & Segvic, S. (2021). Efficient ladder-style DenseNets for semantic segmentation of large images. IEEE Transactions on Intelligent Transportation Systems, 22(8), 4951–4961.

    Article  Google Scholar 

  • Lambert, J., Liu, Z., Sener, O., Hays, J., Koltun, V. (2020). MSeg: A composite dataset for multi-domain semantic segmentation. In: CVPR

  • Lee, D.H. (2013). Pseudo-label : The simple and efficient semi-supervised learning method for deep neural networks. WREPL. 07;.

  • Li, L., Zhou, T., Wang, W., Li, J., & Yang, Y. (2022). Deep hierarchical semantic segmentation. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 1236–1247.

  • Liang, X., Zhou, H., & Xing, E. (2018). Dynamic-structured semantic propagation network. In: CVPR, pp. 752–761.

  • Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft COCO: Common Objects in Context. In: ECCV, pp. 740–755.

  • Liu, Y., Ge, P., Liu, Q., Fan, S., & Wang, Y. (2022). An empirical study on multi-domain robust semantic segmentation. arXiv preprint arXiv:2212.04221.

  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440.

  • Masaki, S., Hirakawa, T., Yamashita, T., & Fujiyoshi, H. (2021). Multi-Domain Semantic-Segmentation using Multi-Head Model. In: 2021 IEEE international intelligent transportation systems conference (ITSC), pp. 2802–2807.

  • McClosky, D., Charniak, E., & Johnson, M. (2006). Effective self-training for parsing. In: NAACL, pp. 152–159.

  • Meletis, P., & Dubbelman, G. (2018). Training of Convolutional networks on multiple heterogeneous datasets for street scene semantic segmentation. In: Intelligent vehicles symposium, pp. 1045–1050.

  • Mohan, R., & Valada, A. (2020). EfficientPS: Efficient panoptic segmentation. International Journal of Computer Vision, 129, 1551–1579.

    Article  Google Scholar 

  • Neuhold, G., Ollmann, T., Rota Bulò, S., & Kontschieder, P. (2017). Mapillary Vistas Dataset for Semantic Understanding of Street Scenes. In: ICCV, pp. 5000–5009.

  • Oršić, M., & Šegvić, S. (2021). Efficient semantic segmentation with pyramidal fusion. Pattern Recognition p. 107611

  • Orsic, M., Bevandic, P., Grubisic, I., Saric, J., & Segvic, S. (2020). Multi-domain semantic segmentation with pyramidal fusion. arXiv preprint arXiv:2009.01636, CVPRW RVC.

  • Porzi, L., Bulò, S.R., Colovic, A., & Kontschieder, P. (2019). Seamless scene segmentation. In: CVPR, pp. 8277–8286.

  • Richter, S. R., Hayder, Z., & Koltun, V. (2017). Playing for Benchmarks. In: ICCV, pp. 2232–2241.

  • Robust Vision Challenge. Accessed: 2022-12-02. http://www.robustvision.net/index.php.

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp. 234–241

  • Rota Bulò, S., Porzi, L., & Kontschieder, P. (2018). In-place activated batchnorm for memory-optimized training of DNNS. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5639–5647.

  • Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks, pp. 4510–4520.

  • Schwartz, R., Dodge, J., Smith, N. A., & Etzioni, O. (2020). Green AI. Communications of the ACM, 63(12), 54–63.

    Article  Google Scholar 

  • Shelhamer, E., Long, J., & Darrell, T. (2017). Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640–651. https://doi.org/10.1109/TPAMI.2016.2572683

    Article  Google Scholar 

  • Song, S., Lichtenberg, S.P., & Xiao, J. (2015). SUN RGB-D: A RGB-D scene understanding benchmark suite. In: CVPR, pp. 567–576.

  • Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. IEEE Computer Society; p. 843–852.

  • Uijlings, J.R.R., Mensink, T., Ferrari, V. (2022). The missing link: Finding label relations across datasets. In: ECCV, pp. 540–556.

  • Varma, G., Subramanian, A., Namboodiri, A.M., Chandraker, M., & Jawahar, C.V. (2019). IDD: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: WACV, pp. 1743–1751.

  • Xiao, J., Xu, Z., Lan, S., Yu, Z., Yuille, A., & Anandkumar, A. (2022). 1st place solution of the robust vision challenge 2022 semantic segmentation track. CoRR. abs/2210.12852.

  • Yin, W., Liu, Y., Shen, C., van den Hengel, A., & Sun, B. (2022). The devil is in the labels: Semantic segmentation from sentences. CoRR. abs/2202.02002.

  • Yu, F., & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In: International conference on learning representations (ICLR)

  • Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V., et al. (2018). BDD100K: A diverse driving video database with scalable annotation tooling. arXiv:1805.04687.

  • Zendel, O., Honauer, K., Murschitz, M., Steininger, D., & Fernandez Dominguez, G. (2018). WildDash–creating Hazard-Aware benchmarks. In: ECCV

  • Zendel, O., Schörghuber, M., Rainer, B., Murschitz, M., & Beleznai, C. (2022). Unifying panoptic segmentation for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 21351–21360.

  • Zhao, H., Qi, X., Shen, X., Shi, J., & Jia, J. (2018). ICNet for real-time semantic segmentation on high-resolution images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., (Eds.), Proceedings of 15th European conference computer vision—ECCV 2018, Munich, Germany, September 8-14, 2018, Part III. vol. 11207 of Lecture Notes in Computer Science. Springer; pp. 418–434.

  • Zhao, X., Schulter, S., Sharma, G., Tsai, Y., Chandraker, M., & Wu, Y. (2020). Object detection with a unified label space from multiple datasets. In: ECCV, pp. 178–193.

  • Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890.

  • Zhen, M., Wang, J., Zhou, L., Fang, T., & Quan, L. (2019). Learning fully dense neural networks for image semantic segmentation. In: AAAI

  • Zhou, X., Koltun, V., & Krähenbühl, P. (2022). Simple multi-dataset detection. In: CVPR

  • Zhou, T., Wang, W., Konukoglu, E., & Van Goo, L. (2022). Rethinking semantic segmentation: A prototype view. In: CVPR, pp. 2572–2583.

  • Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., & Torralba, A. (2017). Scene parsing through ade20k dataset. In: CVPR, pp. 633–641.

  • Zhu, Y., Sapra, K., Reda, F.A., Shih, K.J., Newsam, S., Tao, A., et al. (2019). Improving semantic segmentation via video propagation and label relaxation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8856–8865.

  • Zlateski, A., Jaroensri, R., Sharma, P., & Durand F. (2018). On the importance of label quality for semantic segmentation. In: CVPR, pp. 1479–1487.

Download references

Acknowledgements

This work has been supported by Croatian Science Foundation grant IP-2020-02-5851 ADEPT, by NVIDIA Academic Hardware Grant Program, by European Regional Development Fund grant KK.01.1.1.01.0009 DATACROSS and by VSITE College for Information Technologies who provided access to 6 GPU Tesla-V100 32GB.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petra Bevandić.

Additional information

Communicated by Oliver Zendel.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bevandić, P., Oršić, M., Šarić, J. et al. Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation. Int J Comput Vis 132, 2450–2472 (2024). https://doi.org/10.1007/s11263-024-01986-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-024-01986-z

Keywords

Profiles

  1. Marin Oršić