Abstract
Modern deep learning systems are data-hungry. Learning with web data is one of the feasible solutions, but will introduce label noise inevitably, which can hinder the performance of deep neural networks. Sample selection is an effective way to deal with label noise. The key is to separate clean samples based on some criterion. Previous methods pay more attention to the small loss criterion where small-loss samples are regarded as clean ones. Nevertheless, such a strategy relies on the learning dynamics of each data instance. Some noisy samples are still memorized due to frequently occurring corrupted learning patterns. To tackle this problem, a training-free surrogate model is preferred, freeing from the effect of memorization. In this work, we propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically. CLIP brings external knowledge to facilitate the selection of clean samples with its ability of text-image alignment. Furthermore, a margin adaptive loss is designed to regularize the selection bias introduced by CLIP, providing robustness to label noise. We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets. Our method achieves significant improvement without CLIP involved during the inference stage.
Similar content being viewed by others
Data availibility
The datasets analyzed during the current study are available in https://www.image-net.org/, https://www.cs.toronto.edu/~kriz/cifar.html, http://noisylabels.com/, https://google.github.io/controlled-noisy-web-labels/ and https://data.vision.ee.ethz.ch/cvl/webvision/dataset2017.html. No new datasets were generated.
References
Arazo, E., Ortego, D., Albert, P., O’Connor, N., & McGuinness, K. (2019). Unsupervised label noise modeling and loss correction. In: International conference on machine learning, PMLR, pp 312–321
Arpit, D., Jastrzębski, S., Ballas, N., Krueger, D., Bengio, E., Kanwal, M.S., Maharaj, T., Fischer, A., Courville, A., Bengio, Y., et al. (2017). A closer look at memorization in deep networks. In: International conference on machine learning, PMLR, pp 233–242
Bai, Y., & Liu, T. (2021). Me-momentum: Extracting hard confident examples from noisily labeled data. In: ICCV
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In: European Conference on Computer Vision, Springer, pp 213–229
Chen, P., Liao, B.B., Chen, G., Zhang, S. (2019). Understanding and utilizing deep neural networks trained with noisy labels. In: International Conference on Machine Learning, PMLR, pp 1062–1070
Cheng, D., Liu, T., Ning, Y., Wang, N., Han, B., Niu, G., Gao, X., & Sugiyama, M. (2022a). Instance-dependent label-noise learning with manifold-regularized transition matrix estimation. In: CVPR
Cheng, D., Ning, Y., Wang, N., Gao, X., Yang, H., Du, Y., Han, B., & Liu, T. (2022b). Class-dependent label-noise learning with cycle-consistency regularization. In: Oh AH, Agarwal A, Belgrave D, Cho K (eds) Advances in Neural Information Processing Systems, https://openreview.net/forum?id=IvnoGKQuXi
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp 248–255
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations, https://openreview.net/forum?id=YicbFdNTTy
Feng, C., Tzimiropoulos, G., Patras, I. (2021). Ssr: An efficient and robust framework for learning with unknown label noise. arXiv preprint arXiv:2111.11288
Gadre, S.Y., Ilharco, G., Fang, A., Hayase, J., Smyrnis, G., Nguyen, T., Marten, R., Wortsman, M., Ghosh, D., Zhang, J., et al. (2024). Datacomp: In search of the next generation of multimodal datasets. Advances in Neural Information Processing Systems 36
Garg, A., Nguyen, C., Felix, R., Do, T.T., Carneiro G (2023) Instance-dependent noisy label learning via graphical modelling. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2288–2298
Han, J., Luo, P., & Wang, X. (2019). Deep self-learning from noisy labels. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5138–5147
Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., & Sugiyama, M. (2018a). Co-teaching: Robust training of deep neural networks with extremely noisy labels. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 31, https://proceedings.neurips.cc/paper/2018/file/a19744e268754fb0148b017647355b7b-Paper.pdf
Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., Sugiyama, M. (2018b). Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems 31
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hendrycks, D., Mazeika, M., Wilson, D., & Gimpel, K. (2018). Using trusted data to train deep networks on labels corrupted by severe noise. Advances in neural information processing systems 31
Huang, X., Chong, K.F.E. (2023). Genkl: An iterative framework for resolving label ambiguity and label non-conformity in web images via a new generalized kl divergence. International Journal of Computer Vision pp 1–25
Huang, Z., Zhang, J., & Shan, H. (2023). Twin contrastive learning with noisy labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11661–11670
Iscen, A., Valmadre, J., Arnab, A., & Schmid, C. (2022). Learning with neighbor consistency for noisy labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4672–4681
Jiang, L., Huang, D., Liu, M., & Yang, W. (2020). Beyond synthetic noise: Deep learning on controlled noisy labels. In: International conference on machine learning, PMLR, pp 4804–4815
Jiang, L., Zhou, Z., Leung, T., Li, L.J., Fei-Fei, L. (2018). Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning, PMLR, pp 2304–2313
Kim, N.r., Lee, J.S., & Lee, J.H. (2024). Learning with structural labels for learning with noisy labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 27610–27620
Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., Houlsby, N. (2020). Big transfer (bit): General visual representation learning. In: European conference on computer vision, Springer, pp 491–507
Krizhevsky, A., & Hinton, G., et al. (2009). Learning multiple layers of features from tiny images
Li, J., Socher, R., & Hoi, S.C. (2020). Dividemix: Learning with noisy labels as semi-supervised learning. In: International Conference on Learning Representations, https://openreview.net/forum?id=HJgExaVtwr
Li, W., Wang, L., Li, W., Agustsson, E., & Van Gool, L. (2017). Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862
Li, S., Xia, X., Ge, S., Liu, T. (2022a). Selective-supervised contrastive learning with noisy labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 316–325
Li, S., Xia, X., Zhang, H., Zhan, Y., Ge, S., & Liu, T. (2022b). Estimating noise transition matrix with label correlations for noisy multi-label learning. In: Oh AH, Agarwal A, Belgrave D, Cho K (eds) Advances in Neural Information Processing Systems, https://openreview.net/forum?id=GwXrGy_vc8m
Liang, C., Yang, Z., Zhu, L., & Yang, Y. (2023). Co-learning meets stitch-up for noisy multi-label visual recognition. IEEE Transactions on Image Processing, 32, 2508–2519. https://doi.org/10.1109/TIP.2023.3270103
Lin, T.Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Liu, S., Niles-Weed, J., Razavian, N., & Fernandez-Granda, C. (2020). Early-learning regularization prevents memorization of noisy labels. Advances in neural information processing systems, 33, 20331–20342.
Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A survey. International journal of computer vision, 128, 261–318.
Liu, T., & Tao, D. (2015). Classification with noisy labels by importance reweighting. IEEE Transactions on pattern analysis and machine intelligence, 38(3), 447–461.
Malach, E., & Shalev-Shwartz, S. (2017). Decoupling" when to update" from" how to update". Advances in neural information processing systems 30
Ma, F., Zhu, L., & Yang, Y. (2022). Weakly supervised moment localization with decoupled consistent concept prediction. International Journal of Computer Vision, 130(5), 1244–1258.
Menon, A.K., Jayasumana, S., Rawat, A.S., Jain, H., Veit, A., Kumar, S. (2021). Long-tail learning via logit adjustment. In: International Conference on Learning Representations, https://openreview.net/forum?id=37nvvqkCo5
Ortego, D., Arazo, E., Albert, P., O’Connor, N.E., McGuinness, K. (2021). Multi-objective interpolation training for robustness to label noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6606–6615
Patrini, G., Rozza, A., Krishna Menon, A., Nock, R., & Qu, L. (2017). Making deep neural networks robust to label noise: A loss correction approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1944–1952
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, PMLR, pp 8748–8763
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International journal of computer vision, 115, 211–252.
Schuhmann, C., Vencu, R., Beaumont, R., Kaczmarczyk, R., Mullis, C., Katta, A., Coombes, T., Jitsev, J., Komatsuzaki, A. (2021). Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114
Shu, J., Xie, Q., Yi, L., Zhao, Q., Zhou, S., Xu, Z., & Meng, D. (2019). Meta-weight-net: Learning an explicit mapping for sample weighting. Advances in neural information processing systems 32
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence
Wang, X., Wu, Z., Lian, L., & Yu, S.X. (2022). Debiased learning from naturally imbalanced pseudo-labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14647–14657
Wei, H., Feng, L., Chen, X., & An, B. (2020). Combating noisy labels by agreement: A joint training method with co-regularization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13726–13735
Wei, J., Zhu, Z., Cheng, H., Liu, T., Niu, G., & Liu, Y. (2022). Learning with noisy labels revisited: A study using real-world human annotations. In: International Conference on Learning Representations, https://openreview.net/forum?id=TBWA6PLJZQm
Wu, Z.F., Wei, T., Jiang, J., Mao, C., Tang, M., & Li, Y.F. (2021). Ngc: a unified framework for learning with open-world noisy data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 62–71
Xia, X., Liu, T., Han, B., Gong, M., Yu, J., Niu, G., & Sugiyama, M. (2021). Sample selection with uncertainty of losses for learning with noisy labels. https://openreview.net/forum?id=zGsRcuoR5-0
Xia, X., Liu, T., Han, B., Wang, N., Gong, M., Liu, H., Niu, G., Tao, D., & Sugiyama, M. (2020). Part-dependent label noise: Towards instance-dependent label noise. Advances in Neural Information Processing Systems, 33, 7597–7610.
Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X. (2015). Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2691–2699
Xu, Y., Zhu, L., Jiang, L., & Yang, Y. (2021). Faster meta update strategy for noise-robust deep learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 144–153
Yang, Y., Zhuang, Y., & Pan, Y. (2021). Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies. Frontiers of Information Technology & Electronic Engineering, 22(12), 1551–1558.
Yao, Y., Sun, Z., Zhang, C., Shen, F., Wu, Q., Zhang, J., & Tang, Z. (2021b). Jo-src: A contrastive approach for combating noisy labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5192–5201
Yao, Y., Liu, T., Gong, M., Han, B., Niu, G., & Zhang, K. (2021). Instance-dependent label-noise learning under a structural causal model. Advances in Neural Information Processing Systems, 34, 4409–4420.
Yi, K., & Wu, J. (2019). Probabilistic end-to-end noise correction for learning with noisy labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7017–7025
Yu, X., Han, B., Yao, J., Niu, G., Tsang, I., & Sugiyama, M. (2019). How does disagreement help generalization against label corruption? In: International Conference on Machine Learning, PMLR, pp 7164–7173
Zhang, Z., & Pfister, T. (2021). Learning fast sample re-weighting without reward data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 725–734
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz D. (2018). mixup: Beyond empirical risk minimization. In: International Conference on Learning Representations, https://openreview.net/forum?id=r1Ddp1-Rb
Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107–115.
Zhao, S., Zhu, L., Wang, X., & Yang, Y. (2022). Centerclip: Token clustering for efficient text-video retrieval. In: SIGIR ’22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 11-15, 2022, Madrid, Spain
Zhu, Z., Dong, Z., & Liu, Y. (2022). Detecting corrupted labels without training a model to predict. In: International Conference on Machine Learning, PMLR, pp 27412–27427
Acknowledgements
This work is supported by National Science and Technology Major Project (2022ZD0117802). This work is also partially supported by the Fundamental Research Funds for the Central Universities (Grant Number: 226-2024-00058).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest.
Additional information
Communicated by Giorgos Tolias.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liang, C., Zhu, L., Shi, H. et al. Combating Label Noise with a General Surrogate Model for Sample Selection. Int J Comput Vis 133, 3166–3179 (2025). https://doi.org/10.1007/s11263-024-02324-z
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02324-z