这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Contrastive language and image pre-training (CLIP) achieves great success in various computer vision tasks and also presents an opportune avenue for enhancing weakly-supervised image understanding with its large-scale pre-trained knowledge. As an effective way to reduce the reliance on pixel-level human-annotated labels, weakly-supervised semantic segmentation (WSSS) aims to refine the class activation map (CAM) and produce high-quality pseudo masks. Weakly-supervised semantic segmentation (WSSS) aims to refine the class activation map (CAM) as pseudo masks, but heavily relies on inductive biases like hand-crafted priors and digital image processing methods. For the vision-language pre-trained model, i.e. CLIP, we propose a novel text-to-pixel matching paradigm for WSSS. However, directly applying CLIP to WSSS is challenging due to three critical problems: (1) the task gap between contrastive pre-training and WSSS CAM refinement, (2) lacking text-to-pixel modeling to fully utilize the pre-trained knowledge, and (3) the insufficient details owning to the \(\frac{1}{16}\) down-sampling resolution of ViT. Thus, we propose WeakCLIP to address the problems and leverage the pre-trained knowledge from CLIP to WSSS. Specifically, we first address the task gap by proposing a pyramid adapter and learnable prompts to extract WSSS-specific representation. We then design a co-attention matching module to model text-to-pixel relationships. Finally, the pyramid adapter and text-guided decoder are introduced to gather multi-level information and integrate it with text guidance hierarchically. WeakCLIP provides an effective and parameter-efficient way to transfer CLIP knowledge to refine CAM. Extensive experiments demonstrate that WeakCLIP achieves the state-of-the-art WSSS performance on standard benchmarks, i.e., 74.0% mIoU on the val set of PASCAL VOC 2012 and 46.1% mIoU on the val set of COCO 2014. The source code and model checkpoints are released at https://github.com/hustvl/WeakCLIP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Ahn, J., & Kwak, S. (2018). Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2018.00523

  • Ahn, J., Cho, S., & Kwak, S. (2019). Weakly supervised learning of instance segmentation with inter-pixel relations. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2019.00231

  • Bearman, A., Russakovsky, O., Ferrari, V., Fei-Fei, L. (2016). What’s the point: Semantic segmentation with point supervision. In: Proc. of ECCV.

  • Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., & Joulin, A. (2021). Emerging properties in self-supervised vision transformers. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00951

  • Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2017.502

  • Chang, Y., Wang, Q., Hung, W., Piramuthu, R., Tsai, Y., & Yang, M. (2020). Weakly-supervised semantic segmentation via sub-category exploration. In: Proc. of CVPR. https://doi.org/10.1109/CVPR42600.2020.00901

  • Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A. L. (2015) Semantic image segmentation with deep convolutional nets and fully connected crfs. In: Proc. of ICLR.

  • Chen, L., Wu, W., Fu, C., Han, X., Zhang, Y. (2020a). Weakly supervised semantic segmentation with boundary exploration. In: Proc. of ECCV.

  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI, 40(4), 834–848.

    Article  MATH  Google Scholar 

  • Chen, Q., Yang, L., Lai, J., & Xie, X. (2022). Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.00425

  • Chen, T., Kornblith, S., Norouzi, M., Hinton, G. E. (2020b). A simple framework for contrastive learning of visual representations. In: Proc. of ICML.

  • Chen, T., Mai, Z., Li, R., Chao, W. l. (2023). Segment anything model (sam) enhanced pseudo labels for weakly supervised semantic segmentation. Proc of NeurIPS-W.

  • Chen, Z., Wang, T., Wu, X., Hua, X., Zhang, H., & Sun, Q. (2022). Class re-activation maps for weakly-supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.00104

  • Cheng, Z., Qiao, P., Li, K., Li, S., Wei, P., Ji, X., Yuan, L., Liu, C., Chen, J. (2023). Out-of-candidate rectification for weakly supervised semantic segmentation. In: Proc. of CVPR.

  • Dai, J., He, K., & Sun, J. (2015). Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2015.191

  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N. (2021). An image is worth 16 x 16 words: Transformers for image recognition at scale. In: Proc. of ICLR.

  • Du, Y., Fu, Z., Liu, Q., & Wang, Y. (2022). Weakly supervised semantic segmentation by pixel-to-prototype contrast. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.00428

  • Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88, 303–338.

    Article  Google Scholar 

  • Fan, J., Zhang, Z., Song, C., & Tan, T. (2020). Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR42600.2020.00434

  • Feng, J., Wang, X., & Liu, W. (2021). Deep graph cut network for weakly-supervised semantic segmentation. Science China Information Sciences, 64, 1–12.

    Article  MATH  Google Scholar 

  • Gao, P., Geng, S., Zhang, R., Ma, T., Fang, R., Zhang, Y., Li, H., & Qiao, Y. (2024). Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision, 132(2), 581–595.

    Article  MATH  Google Scholar 

  • Hariharan, B., Arbelaez, P., Bourdev, L. D., Maji, S., & Malik, J. (2011). Semantic contours from inverse detectors. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2011.6126343

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2016.90

  • He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. B. (2020). Momentum contrast for unsupervised visual representation learning. In: Proc. of CVPR. https://doi.org/10.1109/CVPR42600.2020.00975

  • Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., Gelly, S. (2019). Parameter-efficient transfer learning for NLP. In: Proc. of ICML.

  • Hoyer, L., Dai, D., & Gool, L. V. (2022). Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.00969

  • Huang, Z., Wang, X., Wang, J., Liu, W., & Wang, J. (2018). Weakly-supervised semantic segmentation network with deep seeded region growing. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2018.00733

  • Jiang, P., Hou, Q., Cao, Y., Cheng, M., Wei, Y., & Xiong, H. (2019). Integral object mining via online attention accumulation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2019.00216

  • Jiang, P., Yang, Y., Hou, Q., & Wei, Y. (2022). L2G: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.01638

  • Khoreva, A., Benenson, R., Hosang, J. H., Hein, M., & Schiele, B. (2017). Simple does it: Weakly supervised instance and semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2017.181

  • Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo WY, et al. (2023) Segment anything. In: Proc. of ICCV.

  • Kolesnikov, A., Lampert, C. H. (2016). Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In: Proc. of ECCV.

  • Krähenbühl, P., Koltun, V. (2011). Efficient inference in fully connected crfs with gaussian edge potentials. In: Proc. of NeurIPS.

  • Kweon, H., Yoon, S., Kim, H., Park, D., & Yoon, K. (2021). Unlocking the potential of ordinary classifier: Class-specific adversarial erasing framework for weakly supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00691

  • Kweon, H., Yoon, S.H., Yoon, K. J. (2023). Weakly supervised semantic segmentation via adversarial learning of classifier and reconstructor. In: Proc. of CVPR.

  • Lee, J., Kim, E., Lee, S., Lee, J., & Yoon, S. (2019). Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2019.00541

  • Lee, J., Kim, E., Lee, S., Lee, J., & Yoon, S. (2019). Frame-to-frame aggregation of active regions in web videos for weakly supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2019.00691

  • Lee, J., Choi, J., Mok, J., Yoon, S. (2021a). Reducing information bottleneck for weakly supervised semantic segmentation. In: Proc. of NeurIPS.

  • Lee, J., Kim, E., & Yoon, S. (2021). Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR46437.2021.00406

  • Lee, J., Yi, J., Shin, C., & Yoon, S. (2021). BBAM: bounding box attribution map for weakly supervised semantic and instance segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR46437.2021.00267

  • Lee, J., Oh, S. J., Yun, S., Choe, J., Kim, E., & Yoon, S. (2022). Weakly supervised semantic segmentation using out-of-distribution data. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.01639

  • Lee, M., Kim, D., & Shim, H. (2022). Threshold matters in WSSS: manipulating the activation for the robust and accurate segmentation model against thresholds. Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.00429

  • Lee, S., Lee, M., Lee, J., & Shim, H. (2021). Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. Proc. of CVPR. https://doi.org/10.1109/CVPR46437.2021.00545

  • Lei, J., Li, L., Zhou, L., Gan, Z., Berg, T. L., Bansal, M., & Liu, J. (2021). Less is more: Clipbert for video-and-language learning via sparse sampling. In: Proc. of CVPR. https://doi.org/10.1109/CVPR46437.2021.00725

  • Li, J., Fan, J., & Zhang, Z. (2022). Towards noiseless object contours for weakly supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.01635

  • Li, Y., Kuang, Z., Liu, L., Chen, Y., & Zhang, W. (2021). Pseudo-mask matters in weakly-supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00688

  • Li, Y., Duan, Y., Kuang, Z., Chen, Y., Zhang, W., Li, X. (2022b). Uncertainty estimation via response scaling for pseudo-mask noise mitigation in weakly-supervised semantic segmentation. In: Proc. of AAAI.

  • Lin, D., Dai, J., Jia, J., He, K., & Sun, J. (2016). Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2016.344

  • Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In: Proc. of ECCV.

  • Lin, Y., Chen, M., Wang, W., Wu, B., Li, K., Lin, B., Liu, H., He. X. (2023). Clip is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. In: Proc. of CVPR.

  • Loshchilov, I., Hutter, F. (2019). Decoupled weight decay regularization. In: Proc. of ICLR.

  • Lu, J., Batra, D., Parikh, D., Lee, S. (2019). Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Proc. of NeurIPS.

  • Pathak, D., Krähenbühl, P., & Darrell, T. (2015). Constrained convolutional neural networks for weakly supervised segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2015.209

  • Pinheiro, P. H. O., & Collobert, R. (2015). From image-level to pixel-level labeling with convolutional networks. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2015.7298780

  • Qin, J., Wu, J., Xiao, X., Li, L., Wang, X. (2022). Activation modulation and recalibration scheme for weakly supervised semantic segmentation. In: Proc. of AAAI.

  • Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In: Proc. of ICML.

  • Rao, Y., Zhao, W., Chen, G., Tang, Y., Zhu, Z., Huang, G., Zhou, J., Lu, J. (2022). Denseclip: Language-guided dense prediction with context-aware prompting. In: Proc. of CVPR.

  • Rong, S., Tu, B., Wang, Z., Li, J. (2023). Boundary-enhanced co-training for weakly supervised semantic segmentation. In: Proc. of CVPR.

  • Rossetti S, Zappia D, Sanzari M, Schaerf M, Pirri F (2022) Max pooling with vision transformers reconciles class and shape in weakly supervised semantic segmentation. In: Proc. of ECCV

  • Ru, L., Du, B., Zhan, Y., & Wu, C. (2022). Weakly-supervised semantic segmentation with visual words learning and hybrid pooling. International Journal of Computer Vision, 130(4), 1127–1144.

    Article  MATH  Google Scholar 

  • Ru, L., Zheng, H., Zhan, Y., Du, B. (2023). Token contrast for weakly-supervised semantic segmentation. In: Proc. of CVPR.

  • Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2017.74

  • Shen, T., Lin, G., Liu, L., Shen, C., Reid, I. (2017). Weakly supervised semantic segmentation based on co-segmentation. In: Proc. of BMVC.

  • Song, C., Huang, Y., Ouyang, W., & Wang, L. (2019). Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2019.00325

  • Strudel, R., Pinel, R. G., Laptev, I., & Schmid, C. (2021). Segmenter: Transformer for semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00717

  • Su, W., Zhu, X., Cao, Y., Li, B., Lu, L., Wei, F., Dai, J. (2020). VL-BERT: pre-training of generic visual-linguistic representations. In: Proc. of ICLR.

  • Su, Y., Sun, R., Lin, G., & Wu, Q. (2021). Context decoupling augmentation for weakly supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00692

  • Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2017.97

  • Sun, K., Shi, H., Zhang, Z., & Huang, Y. (2021). Ecs-net: Improving weakly supervised semantic segmentation by using connections between class activation maps. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00719

  • Sun, W., Liu, Z., Zhang, Y., Zhong, Y., Barnes, N. (2023). An alternative to wsss? an empirical study of the segment anything model (sam) on weakly-supervised semantic segmentation problems. ArXiv preprint.

  • Tang, M., Djelouah, A., Perazzi, F., Boykov, Y., & Schroers, C. (2018). Normalized cut loss for weakly-supervised CNN segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2018.00195

  • Vernaza, P., & Chandraker, M. (2017). Learning random-walk label propagation for weakly-supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2017.315

  • Wang, Y., Zhang, J., Kan, M., Shan, S., Chen, X. (2020). Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: Proc. of CVPR, https://doi.org/10.1109/CVPR42600.2020.01229

  • Wei, Y., Liang, X., Chen, Y., Shen, X., Cheng, M. M., Feng, J., Zhao, Y., & Yan, S. (2016). Stc: A simple to complex framework for weakly-supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2314–2320.

    Article  MATH  Google Scholar 

  • Wei, Y., Feng, J., Liang, X., Cheng, M., Zhao, Y., & Yan, S. (2017). Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2017.687

  • Wei, Y., Xiao, H., Shi, H., Jie, Z., Feng, J., & Huang, T. S. (2018). Revisiting dilated convolution: A simple approach for weakly- and semi-supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2018.00759

  • Wu, Z., Shen, C., & Van Den Hengel, A. (2019). Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90, 119–133.

    Article  MATH  Google Scholar 

  • Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., Luo, P. (2021). Segformer: Simple and efficient design for semantic segmentation with transformers. In: Proc. of NeurIPS.

  • Xie, J., Hou, X., Ye, K., Shen, L. (2022). Clims: Cross language image matching for weakly supervised semantic segmentation. In: Proc. of CVPR.

  • Xu, J., Schwing, A. G., & Urtasun, R. (2015). Learning to segment under various forms of weak supervision. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2015.7299002

  • Xu, L., Ouyang, W., Bennamoun, M., Boussaïd, F., Sohel, F., & Xu, D. (2021). Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00690

  • Xu, L., Ouyang, W., Bennamoun, M., Boussaïd, F., Sohel, F., & Xu, D. (2021). Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00690

  • Xu, L., Ouyang, W., Bennamoun, M., Boussaid, F., Xu, D. (2022). Multi-class token transformer for weakly supervised semantic segmentation. In: Proc. of CVPR.

  • Yang, J., Sun, X., Lai, Y. K., Zheng, L., & Cheng, M. M. (2018). Recognition from web data: A progressive filtering approach. IEEE Transactions on Image Processing, 27(11), 5303–5315.

    Article  MathSciNet  MATH  Google Scholar 

  • Yang, X., Gong, X. (2024). Foundation model assisted weakly supervised semantic segmentation. In: Proc. of WACV.

  • Yao, Y., Chen, T., Xie, G., Zhang, C., Shen, F., Wu, Q., Tang, Z., & Zhang, J. (2021). Non-salient region object mining for weakly supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR46437.2021.00265

  • Yoon, S. H., Kweon, H., Cho, J., Kim, S., Yoon, K. J. (2022). Adversarial erasing framework via triplet with gated pyramid pooling layer for weakly supervised semantic segmentation. In: Proc. of ECCV.

  • Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. (2010). Deconvolutional networks. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2010.5539957

  • Zhang, B,. Xiao, J., Wei, Y., Sun, M., Huang, K. (2020a). Reliability does matter: An end-to-end weakly supervised semantic segmentation approach. In: Proc. of AAAI.

  • Zhang, D., Zhang, H., Tang, J., Hua, X., Sun, Q. (2020b). Causal intervention for weakly-supervised semantic segmentation. In: Proc. of NeurIPS.

  • Zhang, F., Gu, C., Zhang, C., & Dai, Y. (2021). Complementary patch for weakly supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00715

  • Zhang, R., Fang, R., Gao, P., Zhang, W., Li, K., Dai, J., Qiao, Y., Li, H. (2021b). Tip-adapter: Training-free clip-adapter for better vision-language modeling. ArXiv preprint.

  • Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2016.319

  • Zhou, C., Loy, C. C., Dai, B. (2022a). Extract free dense labels from clip. In: Proc. of ECCV.

  • Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9), 2337–2348.

    Article  MATH  Google Scholar 

  • Zhu, L., Li, Y., Fang, J., Liu, Y., Xin, H., Liu, W., Wang, X. (2023). Weaktr: Exploring plain vision transformer for weakly-supervised semantic segmentation. ArXiv preprint.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinggang Wang.

Additional information

Communicated by Gunhee Kim.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, L., Wang, X., Feng, J. et al. WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation. Int J Comput Vis 133, 1085–1105 (2025). https://doi.org/10.1007/s11263-024-02224-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-024-02224-2

Keywords