CLIMS++: Cross Language Image Matching with Automatic Context Discovery for Weakly Supervised Semantic Segmentation

Xie, Jinheng; Deng, Songhe; Hou, Xianxu; Luo, Zhaochuan; Shen, Linlin; Huang, Yawen; Zheng, Yefeng; Shou, Mike Zheng

doi:10.1007/s11263-025-02442-2

CLIMS++: Cross Language Image Matching with Automatic Context Discovery for Weakly Supervised Semantic Segmentation

Published: 09 May 2025

Volume 133, pages 5569–5588, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Jinheng Xie^1,3,4^na1,
Songhe Deng^1,3^na1,
Xianxu Hou⁵,
Zhaochuan Luo^1,3,
Linlin Shen ORCID: orcid.org/0000-0003-1420-0815^1,2,3,
Yawen Huang⁶,
Yefeng Zheng⁶ &
…
Mike Zheng Shou⁴

564 Accesses
1 Citation
Explore all metrics

Abstract

While promising results have been achieved in weakly-supervised semantic segmentation (WSSS), limited supervision from image-level tags inevitably induces discriminative reliance and spurious relations between target classes and background regions. Thus, Class Activation Map (CAM) usually tends to activate discriminative object regions and falsely includes lots of class-related backgrounds. Without pixel-level supervisions, it could be very difficult to enlarge the foreground activation and suppress those false activation of background regions. In this paper, we propose a novel framework of Cross Language Image Matching with Automatic Context Discovery (CLIMS++), based on the recently introduced Contrastive Language-Image Pre-training (CLIP) model, for WSSS. The core idea of our framework is to introduce natural language supervision to activate more complete object regions and suppress class-related background regions in CAM. In particular, we design object, background region, and text label matching losses to guide the model to excite more reasonable object regions of each category. In addition, we propose to automatically find spurious relations between foreground categories and backgrounds, through which a background suppression loss is designed to suppress the activation of class-related backgrounds. The above designs enable the proposed CLIMS++ to generate a more complete and compact activation map for the target objects. Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 datasets show that our CLIMS++ significantly outperforms the previous state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

Adaptive Cluster Assignment for Unsupervised Semantic Segmentation

WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation

Article 05 September 2024

Context-aware learning and background activation suppression for weakly supervised semantic segmentation

Article 07 February 2025

Data Availability

The datasets adopted in this study are available from the PASCAL VOC 2012 (http://host.robots.ox.ac.uk/pascal/VOC/voc2012) and the MS COCO 2014 (https://cocodataset.org).

Notes

References

Ahn, J., & Kwak, S. (2018). Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4981–4990)
Ahn, J., Cho, S., & Kwak, S. (2019). Weakly supervised learning of instance segmentation with inter-pixel relations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 2209–2218)
Araslanov, N., & Roth, S. (2020). Single-stage semantic segmentation from image labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4253–4262)
Bearman, A. L., Russakovsky, O., Ferrari, V., & Li, F. (2016) What’s the point: Semantic segmentation with point supervision. In Proceeding of the European Conference on Computer Vision, (pp. 549–565)
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc.
Chang, Y., Wang, Q., Hung, W., Piramuthu, R., Ai, Y., & Yang, M. (2020). Weakly-supervised semantic segmentation via sub-category exploration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 8988–8997)
Chen, Z., & Sun, Q. (2023). Extracting class activation maps from non-discriminative features as well. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 3135–3144)
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2015). Semantic image segmentation with deep convolutional nets and fully connected CRFS. In ICLR
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. TPAMI, 40, 834–848.
Article Google Scholar
Chen, L., Wu, W., Fu, C., Han, X., & Zhang, Y. (2020). Weakly supervised semantic segmentation with boundary exploration. In Proceeding of the European Conference on Computer Vision, (pp. 347–362)
Chen, Q., Yang, L., Lai, J. H., & Xie, X. (2022a). Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4288–4298)
Chen, Z., Wang, T., Wu, X., Hua, X. S., Zhan, H., & Sun, Q. (2022b). Class re-activation maps for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 969–978)
Chen, L., Lei, C., Li, R., Li, S., Zhang, Z., & Zhang, L. (2023a). Fpr: False positive rectification for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1108–1118
Chen, T., Mai, Z., Li, R., & Chao, W. l. (2023b). Segment anything model (SAM) enhanced pseudo labels for weakly supervised semantic segmentation. arXiv preprint arXiv:2305.05803
Cheng, Z., Qiao, P., Li, K., Li, S., Wei, P., Ji, X., Yuan, L., Liu, C. and Chen, J., (2023). Out-of-candidate rectification for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 23673–23684)
Dai, J., He, K., & Sun, J. (2015). Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In ICCV, pp. 1635–1643
Deng, S., Zhuo, W., Xie, J., & Shen, L. (2023). Qa-clims: Question-answer cross language image matching for weakly supervised semantic segmentation. In Proceedings of the 31st ACM International Conference on Multimedia, (pp. 5572–5583)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. & Uszkoreit, J., (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Du, Y., Fu, Z., Liu, Q., & Wang, Y. (2022). Weakly supervised semantic segmentation by pixel-to-prototype contrast. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4320–4329)
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2012). The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Gao, W., Wan, F., Pan, X., Peng, Z., Tian, Q., Han, Z., Zhou, B., & Ye, Q. (2021). Ts-cam: Token semantic coupled attention map for weakly supervised object localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 2886–2895)
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 770–778)
Hou, Q., Jiang, P., Wei, Y., & Cheng, M. M. (2018). Self-erasing network for integral object attention. Advances in neural information processing systems, 31, 547–557.
Google Scholar
Huang, Z., Wang, X., Wang, J., Liu, W., & Wang, J. (2018). Weakly-supervised semantic segmentation network with deep seeded region growing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (pp. 7014–7023)
Jang, S., Yun, J., Kwon, J., Lee, E., & Kim, Y. (2024). Dial: Dense image-text alignment for weakly supervised semantic segmentation. arXiv preprint arXiv:2409.15801
Jiang, P. T., & Yang, Y. (2023). Segment anything is a good pseudo-label generator for weakly supervised semantic segmentation. arXiv preprint arXiv:2305.01275
Jiang, P. T., Hou, Q., Cao, Y., Cheng, M. M., Wei, Y. & Xiong, H. K., (2019). Integral object mining via online attention accumulation. In ICCV, (pp. 2070–2079)
Jiang, P. T., Yang, Y., Hou, Q., & Wei, Y. (2022). L2g: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pp. (16886–16896)
Jo, S., & Yu, I. J. (2021). Puzzle-cam: Improved localization via matching partial and full features. In IEEE International Conference on Image Processing, (pp. 639–643)
Jo, S., Yu, I. J., & Kim, K. (2023). Mars: Model-agnostic biased object removal without additional supervision for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 614–623)
Jo, S., Pan, F., Yu, I. J. & Kim, K., (2024). Dhr: Dual features-driven hierarchical rebalancing in inter- and intra-class regions for weakly-supervised semantic segmentation. In European Conference on Computer Vision (ECCV)
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W. Y. & Dollár, P., (2023). Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 4015–4026)
Kolesnikov, A., & Lampert, C. H. (2016). Seed, expand and constrain: Three principles for weakly-supervised image segmentation. Proceeding of the european conference on computer vision (pp. 695–711). Cham: Springer.
Google Scholar
Krähenbühl, P., & Koltun, V. (2011). Efficient inference in fully connected CRFS with gaussian edge potentials. Advances in neural information processing systems, 24, 109–117.
Google Scholar
Kweon, H., & Yoon, K. J. (2024). From SAM to cams: Exploring segment anything model for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 19499–19509)
Kweon, H., Yoon, S. H., Kim, H., Park, D. & Yoon, K. J., (2021). Unlocking the potential of ordinary classifier: Class-specific adversarial erasing framework for weakly supervised semantic segmentation. In: ICCV, (pp. 6994–7003)
Lee, J., Kim, E., Lee, S., Lee, J., & Yoon, S. (2019). Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 5267–5276)
Lee, J., Kim, E., & Yoon, S. (2021a). Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4071–4080)
Lee, J., Yi, J., Shin, C., & Yoon, S. (2021b). Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, (pp. 2643–2652)
Lee, S., Lee, M., Lee, J., & Shim, H. (2021c). Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 5495–5505)
Lee, J., Oh, S. J., Yun, S., Choe, J., Kim, E., & Yoon, S. (2022a). Weakly supervised semantic segmentation using out-of-distribution data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 16897–16906)
Lee, M., Kim, D., & Shim, H. (2022b). Threshold matters in WSSS: Manipulating the activation for the robust and accurate segmentation model against thresholds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (pp. 4330–4339)
Li, B., Weinberger, K. Q., Belongie, S., Koltun, V., & Ranftl, R. (2022a). Language-driven semantic segmentation. arXiv preprint arXiv:2201.03546
Liang, F., Wu, B., Dai, X., Li, K., Zhao, Y., Zhang, H., Zhang, P., Vajda, P., & Marculescu, D. (2023). Open-vocabulary semantic segmentation with mask-adapted clip. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 7061–7070)
Li, Y., Duan, Y., Kuang, Z., Chen, Y., Zhang, W., & Li, X. (2022). Uncertainty estimation via response scaling for pseudo-mask noise mitigation in weakly-supervised semantic segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 1447–1455.
Article Google Scholar
Li, J., Jie, Z., Wang, X., Wei, X., & Ma, L. (2022). Expansion and shrinkage of localization for weakly-supervised semantic segmentation. Advances in Neural Information Processing Systems, 35, 16037–16051.
Google Scholar
Lin, Y., Chen, M., Wang, W., Wu, B., Li, K., Lin, B., Liu, H., & He, X. (2023). Clip is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 15305–15314)
Lin, D., Dai, J., Jia, J., He, K., & Sun, J. (2016). Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 3159–3167)
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. Proceeding of the European Conference on Computer Vision (pp. 740–755). Cham: Springer.
Google Scholar
Liu, Y., Wu, Y. H., Wen, P., Shi, Y., Qiu, Y., & Cheng, M. M. (2020). Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 1415–1428.
Article Google Scholar
Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Jiang, Q., Li, C., Yang, J., Su, H., et al. (2025). Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection. European Conference on Computer Vision (pp. 38–55). Cham: Springer.
Google Scholar
Papandreou, G., Chen, L. C., Murphy, K.P. , & Yuille, A. L. (2015). Weakly- and nsemi-supervised learning of a deep convolutional network for semantic image segmentation. In ICCV, (pp. 1742–1750)
Pathak, D., Krähenbühl, P., & Darrell, T. (2015). Constrained convolutional neural networks for weakly supervised segmentation. In: ICCV, (pp. 1796–1804)
Peng, Z., Wang, G., Xie, L., Jiang, D., Shen, W., & Tian, Q. (2023). Usage: A unified seed area generation paradigm for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 624–634)
Pinheiro, P. H. O., & Collobert, R. (2015). From image-level to pixel-level labeling with convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 1713–1721)
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J. & Krueger, G. (2021). Learning transferable visual models from natural language supervision. In ICML, (pp. 8748–8763)
Ru, L., Du, B., & Wu, C. (2021). Learning visual words for weakly-supervised semantic segmentation. In IJCAI
Ru, L., Zhan, Y., Yu, B., & Du, B. (2022). Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 16846–16855)
Ru, L., Zheng, H., Zhan, Y., & Du, B. (2023). Token contrast for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 3093–3102)
Su, Y., Sun, R., Lin, G., & Wu, Q. (2021). Context decoupling augmentation for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 7004–7014)
Su, G., Wang, W., Dai, J., & Gool, L. V. (2020). Mining cross-image semantics for weakly supervised semantic segmentation. In: Proceeding of the European Conference on Computer Vision, (pp. 347–365)
Sun W, Liu Z, Zhang Y, Zhong Y, & Barnes N (2023) An alternative to WSSS? an empirical study of the segment anything model (SAM) on weakly-supervised semantic segmentation problems. arXiv preprint arXiv:2305.01586
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research 9(11)
Vernaza, P., & Chandraker, M. (2017). Learning random-walk label propagation for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 7158–7166)
Wang, X., Liu, S., Ma, H., & Yang, M. (2020a). Weakly-supervised semantic segmentation by iterative affinity learning. International Journal of Computer Vision, 128, 1736–1749.
Article MathSciNet Google Scholar
Wang, Y., Zhang, J., Kan, M., Shan, S., & Chen, X. (2020b). Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. (12275–12284)
Wang, Z., Lu, Y., Li, Q., Tao, X., Guo, Y., Gong, M., & Liu, T. (2022a). Cris: Clip-driven referring image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Wang, P., Yang, A., Men, R., Lin, J., Bai, S., Li, Z., Ma, J., Zhou, C., Zhou, J., & Yang, H. (2022b). Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In International Conference on Machine Learning, PMLR, (pp. 23318–23340)
Wu, T., Huang, J., Gao, G., Wei, X., Wei, X., Luo, X., & Liu, C. H. (2021). Embedded discriminative attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 16765–16774)
Wu, Z., Shen, C., & van den Hengel, A. (2019). Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90, 119–133.
Article Google Scholar
Xie, J., Hou, X., Ye, K., & Shen, L. (2022a). Clims: Cross language image matching for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4483–4492)
Xie, J., Xiang, J., Chen, J., Hou, X., Zhao, X., & Shen, L. (2022b). C2am: Contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 989–998)
Xu, J., De Mello, S., Liu, S., Byeon, W., Breuel, T., Kautz, J., & Wang, X. (2022a). Groupvit: Semantic segmentation emerges from text supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 18134–18144)
Xu, L., Ouyang, W., Bennamoun, M., Boussaid, F., & Xu, D. (2022b). Multi-class token transformer for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4310–4319)
Xu, L., Ouyang, W., Bennamoun, M., Boussaid, F., & Xu, D. (2023). Learning multi-modal class-specific tokens for weakly supervised dense object localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 19596–19605)
Xu, L., Ouyang, W., Bennamoun, M., Boussaid, F., Sohel, F., & Xu, D. (2021). Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. In ICCV, (pp. 6984–6993)
Xu, M., Zhang, Z., Wei, F., Lin, Y., Cao, Y., Hu, H., & Bai, X. (2022). A simple baseline for open-vocabulary semantic segmentation with pre-trained vision-language model. European Conference on Computer Vision (pp. 736–753). Cham: Springer.
Google Scholar
Yang, X., & Gong, X. (2024). Foundation model assisted weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, (pp. 523–532)
Yoon, S. H., Kwon, H., Jeong, J., Park, D., & Yoon, K. J. (2025). Diffusion-guided weakly supervised semantic segmentation. European Conference on Computer Vision (pp. 393–411). Cham: Springer.
Google Scholar
Zhang, B., Xiao, J., Jiao, J., Wei, Y., & Zhao, Y. (2021). Affinity attention graph neural network for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), 8082–8096.
Article Google Scholar
Zhang, D., Zhang, H., Tang, J., Hua, X., & Sun, Q. (2020). Causal intervention for weakly-supervised semantic segmentation. Advances in neural information processing systems, 33, 655–666.
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 2921–2929)
Zhou, C., Loy, C. C., & Dai, B. (2022a). Extract free dense labels from clip. European Conference on Computer Vision (pp. 696–712). Cham: Springer.
Google Scholar
Zhou, T., Zhang, M., & Zhao, F., Li, J. (2022b). Regional semantic contrast and aggregation for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4299–4309)
Zhu, L., Wang, X., Feng, J., Cheng, T., Li, Y., Jiang, B., Zhang, D., & Han, J. (2024). Weakclip: Adapting clip for weakly-supervised semantic segmentation. International Journal of Computer Vision. 1–21

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (No. 2024YFF0618403), National Natural Science Foundation of China under Grant 82261138629, Guangdong-Macao Science and Technology Innovation Joint Fundation under Grant 2024A0505090003, Guangdong Provincial Key Laboratory under Grant 2023B1212060076, and Shenzhen Municipal Science and Technology Innovation Council under Grant JCYJ20220531101412030.

Author information

Jinheng Xie and Songhe Deng contributed equally to this work.

Authors and Affiliations

Computer Vision Institute, School of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
Jinheng Xie, Songhe Deng, Zhaochuan Luo & Linlin Shen
Department of Computer Science, Wenzhou-Kean University, Wenzhou, 325060, China
Linlin Shen
Guangdong Provincial Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, China
Jinheng Xie, Songhe Deng, Zhaochuan Luo & Linlin Shen
Show Lab, National University of Singapore, Singapore, 639798, Singapore
Jinheng Xie & Mike Zheng Shou
School of AI and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou, 215400, China
Xianxu Hou
Jarvis Research Center, Tencent YouTu Lab, Shenzhen, 518703, China
Yawen Huang & Yefeng Zheng

Authors

Jinheng Xie
View author publications
Search author on:PubMed Google Scholar
Songhe Deng
View author publications
Search author on:PubMed Google Scholar
Xianxu Hou
View author publications
Search author on:PubMed Google Scholar
Zhaochuan Luo
View author publications
Search author on:PubMed Google Scholar
Linlin Shen
View author publications
Search author on:PubMed Google Scholar
Yawen Huang
View author publications
Search author on:PubMed Google Scholar
Yefeng Zheng
View author publications
Search author on:PubMed Google Scholar
Mike Zheng Shou
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Linlin Shen.

Additional information

Communicated by Bryan Allen Plummer.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xie, J., Deng, S., Hou, X. et al. CLIMS++: Cross Language Image Matching with Automatic Context Discovery for Weakly Supervised Semantic Segmentation. Int J Comput Vis 133, 5569–5588 (2025). https://doi.org/10.1007/s11263-025-02442-2

Download citation

Received: 22 April 2024
Accepted: 28 March 2025
Published: 09 May 2025
Version of record: 09 May 2025
Issue date: August 2025
DOI: https://doi.org/10.1007/s11263-025-02442-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CLIMS++: Cross Language Image Matching with Automatic Context Discovery for Weakly Supervised Semantic Segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Cluster Assignment for Unsupervised Semantic Segmentation

WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation

Context-aware learning and background activation suppression for weakly supervised semantic segmentation

Explore related subjects

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now