Abstract
Training deep neural networks (DNNs) typically necessitates large-scale, high-quality annotated datasets. However, due to the inherent challenges of precisely annotating vast numbers of training samples, label noise—characterized by potentially erroneous annotations—is common yet detrimental in practice. Currently, to combat the negative impacts of label noise, mainstream studies follow a pipeline that begins with data sampling and is followed by loss correction. Data sampling aims to partition the original training dataset into clean and noisy subsets, but it often suffers from biased sampling that can mislead models. Additionally, loss correction typically requires knowledge of the noise rate as a priori information, of which the precise estimation can be challenging. To this end, we propose a novel method, Regroup Median Loss Plus Plus (RML++), that addresses both of the previous drawbacks. Specifically, the training dataset is partitioned into clean and noisy subsets using a newly designed separation approach, which synergistically combines prediction consistency with an adaptive threshold to ensure a reliable sampling. Moreover, to ensure the noisy subsets can be robustly learned by models, we suggest to estimate the losses of noisy training samples by utilizing the same-class samples from the clean subset. Subsequently, the proposed method corrects the labels of noisy samples based on the model predictions with the regularization of RML++. Compared to state-of-the-art (SOTA) methods, RML++ achieves significant improvements on both synthetic and challenging real-world datasets. The source code is available at https://github.com/Feng-peng-Li/RML-Extension.
Similar content being viewed by others
References
Arazo, E., Ortego, D., Albert, P., O’Connor, N. E., & McGuinness, K. (2019). Unsupervised label noise modeling and loss correction. In: Proceedings of ICML, pp 312–321
Arpit, D., Jastrzebski, S., Ballas, N., Krueger, D., Bengio, E., Kanwal, M. S., Maharaj, T., Fischer, A., Courville, A., Bengio, Y., & Lacoste-Julien, S. (2017). A closer look at memorization in deep networks. In: Proceedings of ICML, pp 233–242
Bai, Y., Yang, E., Han, B., Yang, Y., Li, J., Mao, Y., Niu, G., & Liu, T. (2021). Understanding and improving early stopping for learning with noisy labels. In: Proceedings of NeurIPS, pp 24392–24403
Berthelot, D., Carlini, N., Goodfellow, I.J., Papernot, N., Oliver, A., & Raffel, C. (2019). Mixmatch: A holistic approach to semi-supervised learning. In: Proceedings of NeurIPS, pp 5050–5060
Bossard, L., Guillaumin, M., & Gool, L. V. (2014). Food-101 - mining discriminative components with random forests. In: Proceedings of ECCV, pp 446–461
Cheng, H., Zhu, Z., Li, X., Gong, Y., Sun, X., & Liu, Y. (2021). Learning with instance-dependent label noise: A sample sieve approach. In: Proceedings of ICLR
Cui, X., Saon, G., Nagano, T., Suzuki, M., Fukuda, T., Kingsbury, B., & Kurata, G. (2022). Improving generalization of deep neural network acoustic models with length perturbation and n-best based label smoothing. In: INTERSPEECH, pp 2638–2642
Frénay, B., & Verleysen, M. (2013). Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst, 25(5), 845–869.
Girshick, R. (2015). Fast r-cnn. In: Proceedings of CVPR, pp 1440–1448
Gui, X., Wang, W., & Tian, Z. (2021). Towards understanding deep learning from noisy labels with small-loss criterion. In: Proceedings of IJCAI, pp 2469–2475
Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I. W., & Sugiyama, M. (2018). Co-teaching: Robust training of deep neural networks with extremely noisy labels. In: Proceedings of NeurIPS, pp 8536–8546
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In: Proceedings of ICCV, pp 2961–2969
Iscen, A., Valmadre, J., Arnab, A., & Schmid, C. (2022). Learning with neighbor consistency for noisy labels. In: Proceedings of CVPR, pp 4662–4671
Karim, N., Rizve, M. N., Rahnavard, N., Mian, A., & Shah, M. (2022). Unicon: Combating label noise through uniform selection and contrastive learning. In: Proceedings of CVPR, pp 9676–9686
Kim, D., Ryoo, K., Cho, H., & Kim, S. (2024). Splitnet: learnable clean-noisy label splitting for learning with noisy labels. Int J Comput Vis, 1(18), 1–18.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In: Proceedings of NeurIPS, pp 1106–1114
Lecué, G., & Lerasle, M. (2020). Robust machine learning by median-of-means: Theory and practice. Ann Stat, 48(2), 906–931.
Lee, K., He, X., Zhang, L., & Yang, L. (2018). Cleannet: Transfer learning for scalable image classifier training with label noise. In: Proceedings of CVPR, pp 5447–5456
Li, F., Li, K., Tian, J., & Zhou, J. (2024). Regroup median loss for combating label noise. In: Proceedings of AAAI, pp 13474–13482
Li. J., Wong, Y., Zhao, Q., & Kankanhalli, M. S. (2019). Learning to learn from noisy labeled data. In: Proceedings of CVPR, pp 5051–5059
Li, J., Socher, R., Hoi, S. C. (2020). Dividemix: Learning with noisy labels as semi-supervised learning. In: Proceedings of ICLR, pp 824–832
Li, S., Xia, X., Ge, S., & Liu, T. (2022a). Selective-supervised contrastive learning with noisy labels. In: Proceedings of CVPR, pp 316–325
Li, S., Xia, X., Zhang, H., Zhan, Y., Ge, S., & Liu, T. (2022b). Estimating noise transition matrix with label correlations for noisy multi-label learning. In: Proceedings of NeurIPS
Li, S., Xia, X., Zhang, H., Ge, S., & Liu, T. (2023a). Multi-label noise transition matrix estimation with label correlations: Theory and algorithm. CoRR arXiv:2309.12706
Li, W., Wang, L., Li. W., Agustsson, E., & Van Gool, L. (2017). Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862
Li. X., Liu, T., Han, B., Niu, G., & Sugiyama, M. (2021). Provably end-to-end label-noise learning without anchor points. In: Proceedings of ICML, pp 6403–6413
Li, Y., Han, H., Shan, S., & Chen, X. (2023b). DISC: learning from noisy labels via dynamic instance-specific selection and correction. In: Proceedings of CVPR, pp 24070–24079
Liu, S., Niles-Weed, J., Razavian, N., & Fernandez-Granda, C. (2020). Early-learning regularization prevents memorization of noisy labels. In: Proceedings of NeurIPS, pp 20331–20342
Liu, T., & Tao, D. (2016). Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell, 38(3), 447–461.
Liu, Y., & Guo, H. (2020). Peer loss functions: Learning from noisy labels without knowing noise rates. In: Proceedings of ICML, pp 6226–6236
Müller, R., Kornblith, S., & Hinton, G. E. (2019). When does label smoothing help? In: Proceedings of NeurIPS, pp 4696–4705
Natarajan, N., Dhillon, I. S., Ravikumar, P., & Tewari, A. (2013). Learning with noisy labels. In: Proceedings of NeurIPS, pp 1196–1204
Ortego, D., Arazo, E., Albert, P., O’Connor, N. E., & McGuinness, K. (2021). Multi-objective interpolation training for robustness to label noise. In: Proceedings of CVPR, pp 6606–6615
Patrini, G., Rozza, A., Menon, A. K., Nock, R., & Qu, L. (2017). Making deep neural networks robust to label noise: A loss correction approach. In: Proceedings of CVPR, pp 2233–2241
Rasmussen, C. (1999). The infinite gaussian mixture model. In: Proceedings of NeurIPS, pp 554–560
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In: Proceedings of ICCV, pp 779–788
Scott, C. (2012). Calibrated asymmetric surrogate losses. Electronic J of Stat, 6, 958–992.
Staerman, G., Laforgue, P., Mozharovskyi, P., & d’Alché Buc, F. (2021). When ot meets mom: Robust estimation of wasserstein distance. In: Proceedings of AISTATS, pp 136–144
Sun, H., Wei, Q., Feng, L., Hu, Y., Liu, F., Fan, H., & Yin, Y. (2024). Variational rectification inference for learning with noisy labels. Int J Comput Vis, 1(18), 1–20.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In: Proceedings of CVPR, pp 1–9
Tanaka, D., Ikami, D., Yamasaki, T., & Aizawa, K. (2018). Joint optimization framework for learning with noisy labels. In: Proceedings of CVPR, pp 5552–5560
Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Proceedings of NeurIPS, pp 1195–1204
Wang, J., Xia, X., Lan, L., Wu, X., Yu, J., Yang, W., Han, B., & Liu, T. (2024). Tackling noisy labels with network parameter additive decomposition. IEEE Trans Pattern Anal Mach Intell, 46(9), 6341–6354.
Wang, Y., Chen, H., Heng, Q., Hou, W., Fan, Y., Wu, Z., Wang, J., Savvides, M., Shinozaki, T., Raj, B., Schiele, B., & Xie, X. (2023). Freematch: Self-adaptive thresholding for semi-supervised learning. In: Proceedings of ICLR
Wei, H., Feng, L., Chen, X., & An, B. (2020). Combating noisy labels by agreement: A joint training method with co-regularization. In: Proceedings of CVPR, pp 13723–13732
Wei, J., Liu, H., Liu, T., Niu, G., Sugiyama, M., & Liu, Y. (2022a). To smooth or not? when label smoothing meets noisy labels. In: Proceedings of ICML, pp 23589–23614
Wei, J., Zhu, Z., Cheng, H., Liu, T., Niu, G., & Liu, Y. (2022b). Learning with noisy labels revisited: A study using real-world human annotations. In: Proceedings of ICLR
Xia, X., Liu, T., Wang, N., Han, B., Gong, C., Niu, G., & Sugiyama, M. (2019). Are anchor points really indispensable in label-noise learning? In: Proceedings of NeurIPS, pp 6835–6846
Xia, X., Liu, T., Han, B., Wang, N., Gong, M., Liu, H., Niu, G., Tao, D., & Sugiyama, M. (2020). Part-dependent label noise: Towards instance-dependent label noise. In: Proceedings of NeurIPS, pp 7597–7610
Xia, X., Liu, T., Han, B., Gong, C., Wang, N., Ge, Z., & Chang, Y. (2021). Robust early-learning: Hindering the memorization of noisy labels. In: Proceedings of ICLR
Xia, X., Liu, T., Han, B., Gong, M., Yu, J., Niu, G., & Sugiyama, M. (2022). Sample selection with uncertainty of losses for learning with noisy labels. In: Proceedings of ICLR
Xia, X., Han, B., Zhan, Y., Yu, J., Gong, M., Gong, C., & Liu, T. (2023). Combating noisy labels with sample selection by mining high-discrepancy examples. In: Proceedings of ICCV, pp 1833–1843
Xiao, T., Xia, T., Yang, Y., Huang, C., & Wang, X. (2015). Learning from massive noisy labeled data for image classification. In: Proceedings of CVPR, pp 2691–2699
Xu, Y., Cao, P., Kong, Y., & Wang, Y. (2019). L_dmi: A novel information-theoretic loss function for training deep nets robust to label noise. In: Proceedings of NeurIPS, pp 6222–6233
Xu, Y., Niu, X., Yang, J., Drew, S., Zhou, J., & Chen, R. (2023). USDNL: uncertainty-based single dropout in noisy label learning. In: Proceedings of AAAI, pp 10648–10656
Yang, S., Yang, E., Han, B., Liu, Y., Xu, M., Niu, G., & Liu, T. (2022). Estimating instance-dependent bayes-label transition matrix using a deep neural network. Proceedings of ICML, 162, 25302–25312.
Yang, S., Wu, S., Yang, E., Han, B., Liu, Y., Xu, M., Niu, G., & Liu, T. (2023). A parametrical model for instance-dependent label noise. IEEE Trans Pattern Anal Mach Intell, 45(12), 14055–14068.
Yao, Y., Sun, Z., Zhang, C., Shen, F., Wu, Q., Zhang, J., & Tang, Z. (2021). Jo-SRC: A contrastive approach for combating noisy labels. In: Proceedings of CVPR, pp 5192–5201
Yi, K., & Wu, J. (2019). Probabilistic end-to-end noise correction for learning with noisy labels. In: Proceedings of CVPR, pp 7017–7025
Yi, R., Guan, D., Huang, Y., & Lu, S. (2023). Class-independent regularization for learning with noisy labels. In: Proceedings of AAAI, pp 3276–3284
Yu, X., Han, B., Yao, J., Niu, G., Tsang, I., & Sugiyama, M. (2019a). How does disagreement help generalization against label corruption? In: Proceedings of ICML, pp 7164–7173
Yu, X., Han, B., Yao, J., Niu, G., Tsang, I. W., & Sugiyama, M. (2019b). How does disagreement help generalization against label corruption? In: Proceedings of ICML, pp 7164–7173
Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2017). Understanding deep learning requires rethinking generalization. In: Proceedings of ICLR
Zhang, C. B., Jiang, P. T., Hou, Q., Wei, Y., Han, Q., Li, Z., & Cheng, M. M. (2021). Delving deep into label smoothing. IEEE Trans Image Process, 30, 5984–5996.
Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., & Agrawal, A. (2018). Context encoding for semantic segmentation. In: Proceedings of CVPR, pp 7151–7160
Zhang, J., Song, B., Wang, H., Han, B., Liu, T., Liu, L., & Sugiyama, M. (2024). Badlabel: A robust perspective on evaluating and enhancing label-noise learning. IEEE Trans Pattern Anal Mach Intell, 46(6), 4398–4409.
Zhang, X., Zou, J., He, K., & Sun, J. (2015). Accelerating very deep convolutional networks for classification and detection. IEEE Trans Pattern Anal Mach Intell, 38(10), 1943–1955.
Zhang, Y., Niu, G., & Sugiyama, M. (2021b). Learning noise transition matrix from only noisy labels via total variation regularization. In: Proceedings of ICML, pp 12501–12512
Zhu, Z., Liu, T., & Liu, Y. (2021). A second-order approach to learning with instance-dependent label noise. In: Proceedings of CVPR, pp 10113–10123
Funding
This work was supported in part by Macau Science and Technology Development Fund under SKLIOTSC-2021-2023, 0022/2022/A1, and 0119/2024/RIB2; in part by Research Committee at University of Macau under MYRG-GRG2023-00058-FST-UMDF; in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515012536.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Bryan Allen Plummer.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Proofs
Proofs
1.1 Proof of Theorem 1
We first introduce necessary lemmas that will facilitate our proof.
Lemma 1
The classifier \(\hat{f}_{\tau }\) with the threshold \(\tau \) can be expressed as:
Proof
By definition, we have:
Therefore,
Lemma 2
(\(\alpha \)-weighted Bayes Optimal (Scott, 2012)) Define \(U_{\alpha }\)-risk under distribution D as
Then, \(f^{*}_{\alpha }=\textrm{sign}(\eta (x)-\alpha )\) minimizes \(U_{\alpha }\)-risk.
Now we proceed to prove Theorem 1 and Corollary 1.
Proof
By Lemma 1, the optimal threshold \(\tau ^*\) is expected to satisfy:
Solving this equation yields \(\tau ^*=\frac{1}{2}+\frac{\beta }{2}(\rho _{-1}-\rho _{+1})\).
Applying Lemma 2, it follows directly that the \(\alpha \)-weighted Bayes classifier under noisy distribution is:
Combining this with the result from Lemma 1, we observe that when
it holds that \(\hat{f}_{\tau }=\tilde{f}_{\alpha }^*\). By selecting the optimal threshold \(\tau ^*\), we obtain \(\alpha ^*=\frac{1-\rho _{+1}+\rho _{-1}}{2}\), as required.
1.2 Proof of Theorem 2
Proof
For convenience, we denote \(\ell _{\textrm{RML}}(f(\varvec{x}),\tilde{y})\) by \(\ell _{\textrm{RML}}\). First of all, with the regroup number n for RML++, we observe that the event \(\left\{ |\ell _{\textrm{RML}}-\hat{\mu }|>\epsilon \right\} \) implies that at least \(\frac{n+1}{2}\) of mean loss \(\ell ^s_i\) in \(\mathcal {W}^{*}\) have to be outside \(\epsilon \) distance to \(\hat{\mu }\). Namely,
Define \(Z_i=\mathbb {1}(|\ell ^s_i-\hat{\mu }|>\epsilon )\). Let \(p_{\epsilon ,k}=\mathbb {E}[Z_i]=P\left( |\ell ^s_i-\hat{\mu }|>\epsilon \right) \), \(i=1,\dots ,n\), and \(p_{\epsilon }=\mathbb {E}[Z_{n+1}]=P\left( |\ell -\hat{\mu }|>\epsilon \right) \), then we can obtain
Because \(Z_i\) are independent random variable bounded by 1, then by Hoeffding’s inequality (one-sided), with any \(\tau >0\), we can obtain
As a result,
Due to \(\textrm{Var}_{\varvec{X^*}|\tilde{Y}=\tilde{y}}[\ell (f(\varvec{X^{*}}),\tilde{y})]=\hat{\sigma }^2<\infty \) and Chebyshev’s inequality, we can obtain
Then the bound is
where \(\mathrm {C_1}=2(n+1),\mathrm {C_2}=\frac{n+k}{k(n+1)}>0.\) \(\square \)
Remark 2
Note that after the operation in Eqs. (1) and (5), intuitively, losses are closer to the true loss and so that number of \(\ell _i^s\) outside \(\epsilon \) distance to \(\hat{\mu }\) decreases. Therefore, \(\sum ^{n+1}_{i=1} \mathbb {1}\left( |\ell ^s_i-\hat{\mu }|>\epsilon \right) \) decreases and so does \(P\left( \sum ^{n+1}_{i=1} Z_i \geqslant \frac{n+1}{2}\right) \), i.e., the upper bound in Theorem 2 is reduced.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, F., Li, K., Wang, Q. et al. RML++: Regroup Median Loss for Combating Label Noise. Int J Comput Vis 133, 6400–6421 (2025). https://doi.org/10.1007/s11263-025-02494-4
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-025-02494-4