WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation

Zhu, Lianghui; Wang, Xinggang; Feng, Jiapei; Cheng, Tianheng; Li, Yingyue; Jiang, Bo; Zhang, Dingwen; Han, Junwei

doi:10.1007/s11263-024-02224-2

WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation

Published: 05 September 2024

Volume 133, pages 1085–1105, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Lianghui Zhu¹,
Xinggang Wang ORCID: orcid.org/0000-0001-6732-7823¹,
Jiapei Feng¹,
Tianheng Cheng¹,
Yingyue Li¹,
Bo Jiang¹,
Dingwen Zhang² &
…
Junwei Han²

3343 Accesses
15 Citations
Explore all metrics

Abstract

Contrastive language and image pre-training (CLIP) achieves great success in various computer vision tasks and also presents an opportune avenue for enhancing weakly-supervised image understanding with its large-scale pre-trained knowledge. As an effective way to reduce the reliance on pixel-level human-annotated labels, weakly-supervised semantic segmentation (WSSS) aims to refine the class activation map (CAM) and produce high-quality pseudo masks. Weakly-supervised semantic segmentation (WSSS) aims to refine the class activation map (CAM) as pseudo masks, but heavily relies on inductive biases like hand-crafted priors and digital image processing methods. For the vision-language pre-trained model, i.e. CLIP, we propose a novel text-to-pixel matching paradigm for WSSS. However, directly applying CLIP to WSSS is challenging due to three critical problems: (1) the task gap between contrastive pre-training and WSSS CAM refinement, (2) lacking text-to-pixel modeling to fully utilize the pre-trained knowledge, and (3) the insufficient details owning to the $\frac{1}{16}$ down-sampling resolution of ViT. Thus, we propose WeakCLIP to address the problems and leverage the pre-trained knowledge from CLIP to WSSS. Specifically, we first address the task gap by proposing a pyramid adapter and learnable prompts to extract WSSS-specific representation. We then design a co-attention matching module to model text-to-pixel relationships. Finally, the pyramid adapter and text-guided decoder are introduced to gather multi-level information and integrate it with text guidance hierarchically. WeakCLIP provides an effective and parameter-efficient way to transfer CLIP knowledge to refine CAM. Extensive experiments demonstrate that WeakCLIP achieves the state-of-the-art WSSS performance on standard benchmarks, i.e., 74.0% mIoU on the val set of PASCAL VOC 2012 and 46.1% mIoU on the val set of COCO 2014. The source code and model checkpoints are released at https://github.com/hustvl/WeakCLIP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CLIMS++: Cross Language Image Matching with Automatic Context Discovery for Weakly Supervised Semantic Segmentation

Article 09 May 2025

Fine-Tuning of CLIP in Few-Shot Scenarios via Supervised Contrastive Learning

SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference

References

Ahn, J., & Kwak, S. (2018). Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2018.00523
Ahn, J., Cho, S., & Kwak, S. (2019). Weakly supervised learning of instance segmentation with inter-pixel relations. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2019.00231
Bearman, A., Russakovsky, O., Ferrari, V., Fei-Fei, L. (2016). What’s the point: Semantic segmentation with point supervision. In: Proc. of ECCV.
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., & Joulin, A. (2021). Emerging properties in self-supervised vision transformers. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00951
Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2017.502
Chang, Y., Wang, Q., Hung, W., Piramuthu, R., Tsai, Y., & Yang, M. (2020). Weakly-supervised semantic segmentation via sub-category exploration. In: Proc. of CVPR. https://doi.org/10.1109/CVPR42600.2020.00901
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A. L. (2015) Semantic image segmentation with deep convolutional nets and fully connected crfs. In: Proc. of ICLR.
Chen, L., Wu, W., Fu, C., Han, X., Zhang, Y. (2020a). Weakly supervised semantic segmentation with boundary exploration. In: Proc. of ECCV.
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI, 40(4), 834–848.
Article MATH Google Scholar
Chen, Q., Yang, L., Lai, J., & Xie, X. (2022). Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.00425
Chen, T., Kornblith, S., Norouzi, M., Hinton, G. E. (2020b). A simple framework for contrastive learning of visual representations. In: Proc. of ICML.
Chen, T., Mai, Z., Li, R., Chao, W. l. (2023). Segment anything model (sam) enhanced pseudo labels for weakly supervised semantic segmentation. Proc of NeurIPS-W.
Chen, Z., Wang, T., Wu, X., Hua, X., Zhang, H., & Sun, Q. (2022). Class re-activation maps for weakly-supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.00104
Cheng, Z., Qiao, P., Li, K., Li, S., Wei, P., Ji, X., Yuan, L., Liu, C., Chen, J. (2023). Out-of-candidate rectification for weakly supervised semantic segmentation. In: Proc. of CVPR.
Dai, J., He, K., & Sun, J. (2015). Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2015.191
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N. (2021). An image is worth 16 x 16 words: Transformers for image recognition at scale. In: Proc. of ICLR.
Du, Y., Fu, Z., Liu, Q., & Wang, Y. (2022). Weakly supervised semantic segmentation by pixel-to-prototype contrast. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.00428
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88, 303–338.
Article Google Scholar
Fan, J., Zhang, Z., Song, C., & Tan, T. (2020). Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR42600.2020.00434
Feng, J., Wang, X., & Liu, W. (2021). Deep graph cut network for weakly-supervised semantic segmentation. Science China Information Sciences, 64, 1–12.
Article MATH Google Scholar
Gao, P., Geng, S., Zhang, R., Ma, T., Fang, R., Zhang, Y., Li, H., & Qiao, Y. (2024). Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision, 132(2), 581–595.
Article MATH Google Scholar
Hariharan, B., Arbelaez, P., Bourdev, L. D., Maji, S., & Malik, J. (2011). Semantic contours from inverse detectors. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2011.6126343
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2016.90
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. B. (2020). Momentum contrast for unsupervised visual representation learning. In: Proc. of CVPR. https://doi.org/10.1109/CVPR42600.2020.00975
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., Gelly, S. (2019). Parameter-efficient transfer learning for NLP. In: Proc. of ICML.
Hoyer, L., Dai, D., & Gool, L. V. (2022). Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.00969
Huang, Z., Wang, X., Wang, J., Liu, W., & Wang, J. (2018). Weakly-supervised semantic segmentation network with deep seeded region growing. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2018.00733
Jiang, P., Hou, Q., Cao, Y., Cheng, M., Wei, Y., & Xiong, H. (2019). Integral object mining via online attention accumulation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2019.00216
Jiang, P., Yang, Y., Hou, Q., & Wei, Y. (2022). L2G: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.01638
Khoreva, A., Benenson, R., Hosang, J. H., Hein, M., & Schiele, B. (2017). Simple does it: Weakly supervised instance and semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2017.181
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo WY, et al. (2023) Segment anything. In: Proc. of ICCV.
Kolesnikov, A., Lampert, C. H. (2016). Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In: Proc. of ECCV.
Krähenbühl, P., Koltun, V. (2011). Efficient inference in fully connected crfs with gaussian edge potentials. In: Proc. of NeurIPS.
Kweon, H., Yoon, S., Kim, H., Park, D., & Yoon, K. (2021). Unlocking the potential of ordinary classifier: Class-specific adversarial erasing framework for weakly supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00691
Kweon, H., Yoon, S.H., Yoon, K. J. (2023). Weakly supervised semantic segmentation via adversarial learning of classifier and reconstructor. In: Proc. of CVPR.
Lee, J., Kim, E., Lee, S., Lee, J., & Yoon, S. (2019). Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2019.00541
Lee, J., Kim, E., Lee, S., Lee, J., & Yoon, S. (2019). Frame-to-frame aggregation of active regions in web videos for weakly supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2019.00691
Lee, J., Choi, J., Mok, J., Yoon, S. (2021a). Reducing information bottleneck for weakly supervised semantic segmentation. In: Proc. of NeurIPS.
Lee, J., Kim, E., & Yoon, S. (2021). Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR46437.2021.00406
Lee, J., Yi, J., Shin, C., & Yoon, S. (2021). BBAM: bounding box attribution map for weakly supervised semantic and instance segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR46437.2021.00267
Lee, J., Oh, S. J., Yun, S., Choe, J., Kim, E., & Yoon, S. (2022). Weakly supervised semantic segmentation using out-of-distribution data. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.01639
Lee, M., Kim, D., & Shim, H. (2022). Threshold matters in WSSS: manipulating the activation for the robust and accurate segmentation model against thresholds. Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.00429
Lee, S., Lee, M., Lee, J., & Shim, H. (2021). Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. Proc. of CVPR. https://doi.org/10.1109/CVPR46437.2021.00545
Lei, J., Li, L., Zhou, L., Gan, Z., Berg, T. L., Bansal, M., & Liu, J. (2021). Less is more: Clipbert for video-and-language learning via sparse sampling. In: Proc. of CVPR. https://doi.org/10.1109/CVPR46437.2021.00725
Li, J., Fan, J., & Zhang, Z. (2022). Towards noiseless object contours for weakly supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR52688.2022.01635
Li, Y., Kuang, Z., Liu, L., Chen, Y., & Zhang, W. (2021). Pseudo-mask matters in weakly-supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00688
Li, Y., Duan, Y., Kuang, Z., Chen, Y., Zhang, W., Li, X. (2022b). Uncertainty estimation via response scaling for pseudo-mask noise mitigation in weakly-supervised semantic segmentation. In: Proc. of AAAI.
Lin, D., Dai, J., Jia, J., He, K., & Sun, J. (2016). Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2016.344
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In: Proc. of ECCV.
Lin, Y., Chen, M., Wang, W., Wu, B., Li, K., Lin, B., Liu, H., He. X. (2023). Clip is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. In: Proc. of CVPR.
Loshchilov, I., Hutter, F. (2019). Decoupled weight decay regularization. In: Proc. of ICLR.
Lu, J., Batra, D., Parikh, D., Lee, S. (2019). Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Proc. of NeurIPS.
Pathak, D., Krähenbühl, P., & Darrell, T. (2015). Constrained convolutional neural networks for weakly supervised segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2015.209
Pinheiro, P. H. O., & Collobert, R. (2015). From image-level to pixel-level labeling with convolutional networks. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2015.7298780
Qin, J., Wu, J., Xiao, X., Li, L., Wang, X. (2022). Activation modulation and recalibration scheme for weakly supervised semantic segmentation. In: Proc. of AAAI.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In: Proc. of ICML.
Rao, Y., Zhao, W., Chen, G., Tang, Y., Zhu, Z., Huang, G., Zhou, J., Lu, J. (2022). Denseclip: Language-guided dense prediction with context-aware prompting. In: Proc. of CVPR.
Rong, S., Tu, B., Wang, Z., Li, J. (2023). Boundary-enhanced co-training for weakly supervised semantic segmentation. In: Proc. of CVPR.
Rossetti S, Zappia D, Sanzari M, Schaerf M, Pirri F (2022) Max pooling with vision transformers reconciles class and shape in weakly supervised semantic segmentation. In: Proc. of ECCV
Ru, L., Du, B., Zhan, Y., & Wu, C. (2022). Weakly-supervised semantic segmentation with visual words learning and hybrid pooling. International Journal of Computer Vision, 130(4), 1127–1144.
Article MATH Google Scholar
Ru, L., Zheng, H., Zhan, Y., Du, B. (2023). Token contrast for weakly-supervised semantic segmentation. In: Proc. of CVPR.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2017.74
Shen, T., Lin, G., Liu, L., Shen, C., Reid, I. (2017). Weakly supervised semantic segmentation based on co-segmentation. In: Proc. of BMVC.
Song, C., Huang, Y., Ouyang, W., & Wang, L. (2019). Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2019.00325
Strudel, R., Pinel, R. G., Laptev, I., & Schmid, C. (2021). Segmenter: Transformer for semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00717
Su, W., Zhu, X., Cao, Y., Li, B., Lu, L., Wei, F., Dai, J. (2020). VL-BERT: pre-training of generic visual-linguistic representations. In: Proc. of ICLR.
Su, Y., Sun, R., Lin, G., & Wu, Q. (2021). Context decoupling augmentation for weakly supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00692
Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In: Proc. of ICCV. https://doi.org/10.1109/ICCV.2017.97
Sun, K., Shi, H., Zhang, Z., & Huang, Y. (2021). Ecs-net: Improving weakly supervised semantic segmentation by using connections between class activation maps. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00719
Sun, W., Liu, Z., Zhang, Y., Zhong, Y., Barnes, N. (2023). An alternative to wsss? an empirical study of the segment anything model (sam) on weakly-supervised semantic segmentation problems. ArXiv preprint.
Tang, M., Djelouah, A., Perazzi, F., Boykov, Y., & Schroers, C. (2018). Normalized cut loss for weakly-supervised CNN segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2018.00195
Vernaza, P., & Chandraker, M. (2017). Learning random-walk label propagation for weakly-supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2017.315
Wang, Y., Zhang, J., Kan, M., Shan, S., Chen, X. (2020). Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: Proc. of CVPR, https://doi.org/10.1109/CVPR42600.2020.01229
Wei, Y., Liang, X., Chen, Y., Shen, X., Cheng, M. M., Feng, J., Zhao, Y., & Yan, S. (2016). Stc: A simple to complex framework for weakly-supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2314–2320.
Article MATH Google Scholar
Wei, Y., Feng, J., Liang, X., Cheng, M., Zhao, Y., & Yan, S. (2017). Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2017.687
Wei, Y., Xiao, H., Shi, H., Jie, Z., Feng, J., & Huang, T. S. (2018). Revisiting dilated convolution: A simple approach for weakly- and semi-supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2018.00759
Wu, Z., Shen, C., & Van Den Hengel, A. (2019). Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90, 119–133.
Article MATH Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., Luo, P. (2021). Segformer: Simple and efficient design for semantic segmentation with transformers. In: Proc. of NeurIPS.
Xie, J., Hou, X., Ye, K., Shen, L. (2022). Clims: Cross language image matching for weakly supervised semantic segmentation. In: Proc. of CVPR.
Xu, J., Schwing, A. G., & Urtasun, R. (2015). Learning to segment under various forms of weak supervision. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2015.7299002
Xu, L., Ouyang, W., Bennamoun, M., Boussaïd, F., Sohel, F., & Xu, D. (2021). Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00690
Xu, L., Ouyang, W., Bennamoun, M., Boussaïd, F., Sohel, F., & Xu, D. (2021). Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00690
Xu, L., Ouyang, W., Bennamoun, M., Boussaid, F., Xu, D. (2022). Multi-class token transformer for weakly supervised semantic segmentation. In: Proc. of CVPR.
Yang, J., Sun, X., Lai, Y. K., Zheng, L., & Cheng, M. M. (2018). Recognition from web data: A progressive filtering approach. IEEE Transactions on Image Processing, 27(11), 5303–5315.
Article MathSciNet MATH Google Scholar
Yang, X., Gong, X. (2024). Foundation model assisted weakly supervised semantic segmentation. In: Proc. of WACV.
Yao, Y., Chen, T., Xie, G., Zhang, C., Shen, F., Wu, Q., Tang, Z., & Zhang, J. (2021). Non-salient region object mining for weakly supervised semantic segmentation. In: Proc. of CVPR. https://doi.org/10.1109/CVPR46437.2021.00265
Yoon, S. H., Kweon, H., Cho, J., Kim, S., Yoon, K. J. (2022). Adversarial erasing framework via triplet with gated pyramid pooling layer for weakly supervised semantic segmentation. In: Proc. of ECCV.
Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. (2010). Deconvolutional networks. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2010.5539957
Zhang, B,. Xiao, J., Wei, Y., Sun, M., Huang, K. (2020a). Reliability does matter: An end-to-end weakly supervised semantic segmentation approach. In: Proc. of AAAI.
Zhang, D., Zhang, H., Tang, J., Hua, X., Sun, Q. (2020b). Causal intervention for weakly-supervised semantic segmentation. In: Proc. of NeurIPS.
Zhang, F., Gu, C., Zhang, C., & Dai, Y. (2021). Complementary patch for weakly supervised semantic segmentation. In: Proc. of ICCV. https://doi.org/10.1109/ICCV48922.2021.00715
Zhang, R., Fang, R., Gao, P., Zhang, W., Li, K., Dai, J., Qiao, Y., Li, H. (2021b). Tip-adapter: Training-free clip-adapter for better vision-language modeling. ArXiv preprint.
Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In: Proc. of CVPR. https://doi.org/10.1109/CVPR.2016.319
Zhou, C., Loy, C. C., Dai, B. (2022a). Extract free dense labels from clip. In: Proc. of ECCV.
Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9), 2337–2348.
Article MATH Google Scholar
Zhu, L., Li, Y., Fang, J., Liu, Y., Xin, H., Liu, W., Wang, X. (2023). Weaktr: Exploring plain vision transformer for weakly-supervised semantic segmentation. ArXiv preprint.

Download references

Author information

Authors and Affiliations

Huazhong University of Science and Technology, Wuhan, China
Lianghui Zhu, Xinggang Wang, Jiapei Feng, Tianheng Cheng, Yingyue Li & Bo Jiang
Northwestern Polytechnical University, Xi’an, China
Dingwen Zhang & Junwei Han

Authors

Lianghui Zhu
View author publications
Search author on:PubMed Google Scholar
Xinggang Wang
View author publications
Search author on:PubMed Google Scholar
Jiapei Feng
View author publications
Search author on:PubMed Google Scholar
Tianheng Cheng
View author publications
Search author on:PubMed Google Scholar
Yingyue Li
View author publications
Search author on:PubMed Google Scholar
Bo Jiang
View author publications
Search author on:PubMed Google Scholar
Dingwen Zhang
View author publications
Search author on:PubMed Google Scholar
Junwei Han
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Xinggang Wang.

Additional information

Communicated by Gunhee Kim.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, L., Wang, X., Feng, J. et al. WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation. Int J Comput Vis 133, 1085–1105 (2025). https://doi.org/10.1007/s11263-024-02224-2

Download citation

Received: 29 October 2023
Accepted: 15 August 2024
Published: 05 September 2024
Version of record: 05 September 2024
Issue date: March 2025
DOI: https://doi.org/10.1007/s11263-024-02224-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CLIMS++: Cross Language Image Matching with Automatic Context Discovery for Weakly Supervised Semantic Segmentation

Fine-Tuning of CLIP in Few-Shot Scenarios via Supervised Contrastive Learning

SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now