RML++: Regroup Median Loss for Combating Label Noise

Li, Fengpeng; Li, Kemou; Wang, Qizhou; Han, Bo; Tian, Jinyu; Zhou, Jiantao

doi:10.1007/s11263-025-02494-4

RML++: Regroup Median Loss for Combating Label Noise

Published: 09 June 2025

Volume 133, pages 6400–6421, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Fengpeng Li ORCID: orcid.org/0000-0001-5080-7425¹,
Kemou Li¹,
Qizhou Wang²,
Bo Han²,
Jinyu Tian³ &
…
Jiantao Zhou¹

359 Accesses
Explore all metrics

Abstract

Training deep neural networks (DNNs) typically necessitates large-scale, high-quality annotated datasets. However, due to the inherent challenges of precisely annotating vast numbers of training samples, label noise—characterized by potentially erroneous annotations—is common yet detrimental in practice. Currently, to combat the negative impacts of label noise, mainstream studies follow a pipeline that begins with data sampling and is followed by loss correction. Data sampling aims to partition the original training dataset into clean and noisy subsets, but it often suffers from biased sampling that can mislead models. Additionally, loss correction typically requires knowledge of the noise rate as a priori information, of which the precise estimation can be challenging. To this end, we propose a novel method, Regroup Median Loss Plus Plus (RML++), that addresses both of the previous drawbacks. Specifically, the training dataset is partitioned into clean and noisy subsets using a newly designed separation approach, which synergistically combines prediction consistency with an adaptive threshold to ensure a reliable sampling. Moreover, to ensure the noisy subsets can be robustly learned by models, we suggest to estimate the losses of noisy training samples by utilizing the same-class samples from the clean subset. Subsequently, the proposed method corrects the labels of noisy samples based on the model predictions with the regularization of RML++. Compared to state-of-the-art (SOTA) methods, RML++ achieves significant improvements on both synthetic and challenging real-world datasets. The source code is available at https://github.com/Feng-peng-Li/RML-Extension.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Delving Deeper Into Clean Samples for Combating Noisy Labels

Enhancing Fairness and Robustness in Label-Noise Learning Through Advanced Sample Selection and Adversarial Optimization

Foster Adaptivity and Balance in Learning with Noisy Labels

References

Arazo, E., Ortego, D., Albert, P., O’Connor, N. E., & McGuinness, K. (2019). Unsupervised label noise modeling and loss correction. In: Proceedings of ICML, pp 312–321
Arpit, D., Jastrzebski, S., Ballas, N., Krueger, D., Bengio, E., Kanwal, M. S., Maharaj, T., Fischer, A., Courville, A., Bengio, Y., & Lacoste-Julien, S. (2017). A closer look at memorization in deep networks. In: Proceedings of ICML, pp 233–242
Bai, Y., Yang, E., Han, B., Yang, Y., Li, J., Mao, Y., Niu, G., & Liu, T. (2021). Understanding and improving early stopping for learning with noisy labels. In: Proceedings of NeurIPS, pp 24392–24403
Berthelot, D., Carlini, N., Goodfellow, I.J., Papernot, N., Oliver, A., & Raffel, C. (2019). Mixmatch: A holistic approach to semi-supervised learning. In: Proceedings of NeurIPS, pp 5050–5060
Bossard, L., Guillaumin, M., & Gool, L. V. (2014). Food-101 - mining discriminative components with random forests. In: Proceedings of ECCV, pp 446–461
Cheng, H., Zhu, Z., Li, X., Gong, Y., Sun, X., & Liu, Y. (2021). Learning with instance-dependent label noise: A sample sieve approach. In: Proceedings of ICLR
Cui, X., Saon, G., Nagano, T., Suzuki, M., Fukuda, T., Kingsbury, B., & Kurata, G. (2022). Improving generalization of deep neural network acoustic models with length perturbation and n-best based label smoothing. In: INTERSPEECH, pp 2638–2642
Frénay, B., & Verleysen, M. (2013). Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst, 25(5), 845–869.
Article Google Scholar
Girshick, R. (2015). Fast r-cnn. In: Proceedings of CVPR, pp 1440–1448
Gui, X., Wang, W., & Tian, Z. (2021). Towards understanding deep learning from noisy labels with small-loss criterion. In: Proceedings of IJCAI, pp 2469–2475
Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I. W., & Sugiyama, M. (2018). Co-teaching: Robust training of deep neural networks with extremely noisy labels. In: Proceedings of NeurIPS, pp 8536–8546
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In: Proceedings of ICCV, pp 2961–2969
Iscen, A., Valmadre, J., Arnab, A., & Schmid, C. (2022). Learning with neighbor consistency for noisy labels. In: Proceedings of CVPR, pp 4662–4671
Karim, N., Rizve, M. N., Rahnavard, N., Mian, A., & Shah, M. (2022). Unicon: Combating label noise through uniform selection and contrastive learning. In: Proceedings of CVPR, pp 9676–9686
Kim, D., Ryoo, K., Cho, H., & Kim, S. (2024). Splitnet: learnable clean-noisy label splitting for learning with noisy labels. Int J Comput Vis, 1(18), 1–18.
Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In: Proceedings of NeurIPS, pp 1106–1114
Lecué, G., & Lerasle, M. (2020). Robust machine learning by median-of-means: Theory and practice. Ann Stat, 48(2), 906–931.
Article MathSciNet Google Scholar
Lee, K., He, X., Zhang, L., & Yang, L. (2018). Cleannet: Transfer learning for scalable image classifier training with label noise. In: Proceedings of CVPR, pp 5447–5456
Li, F., Li, K., Tian, J., & Zhou, J. (2024). Regroup median loss for combating label noise. In: Proceedings of AAAI, pp 13474–13482
Li. J., Wong, Y., Zhao, Q., & Kankanhalli, M. S. (2019). Learning to learn from noisy labeled data. In: Proceedings of CVPR, pp 5051–5059
Li, J., Socher, R., Hoi, S. C. (2020). Dividemix: Learning with noisy labels as semi-supervised learning. In: Proceedings of ICLR, pp 824–832
Li, S., Xia, X., Ge, S., & Liu, T. (2022a). Selective-supervised contrastive learning with noisy labels. In: Proceedings of CVPR, pp 316–325
Li, S., Xia, X., Zhang, H., Zhan, Y., Ge, S., & Liu, T. (2022b). Estimating noise transition matrix with label correlations for noisy multi-label learning. In: Proceedings of NeurIPS
Li, S., Xia, X., Zhang, H., Ge, S., & Liu, T. (2023a). Multi-label noise transition matrix estimation with label correlations: Theory and algorithm. CoRR arXiv:2309.12706
Li, W., Wang, L., Li. W., Agustsson, E., & Van Gool, L. (2017). Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862
Li. X., Liu, T., Han, B., Niu, G., & Sugiyama, M. (2021). Provably end-to-end label-noise learning without anchor points. In: Proceedings of ICML, pp 6403–6413
Li, Y., Han, H., Shan, S., & Chen, X. (2023b). DISC: learning from noisy labels via dynamic instance-specific selection and correction. In: Proceedings of CVPR, pp 24070–24079
Liu, S., Niles-Weed, J., Razavian, N., & Fernandez-Granda, C. (2020). Early-learning regularization prevents memorization of noisy labels. In: Proceedings of NeurIPS, pp 20331–20342
Liu, T., & Tao, D. (2016). Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell, 38(3), 447–461.
Article Google Scholar
Liu, Y., & Guo, H. (2020). Peer loss functions: Learning from noisy labels without knowing noise rates. In: Proceedings of ICML, pp 6226–6236
Müller, R., Kornblith, S., & Hinton, G. E. (2019). When does label smoothing help? In: Proceedings of NeurIPS, pp 4696–4705
Natarajan, N., Dhillon, I. S., Ravikumar, P., & Tewari, A. (2013). Learning with noisy labels. In: Proceedings of NeurIPS, pp 1196–1204
Ortego, D., Arazo, E., Albert, P., O’Connor, N. E., & McGuinness, K. (2021). Multi-objective interpolation training for robustness to label noise. In: Proceedings of CVPR, pp 6606–6615
Patrini, G., Rozza, A., Menon, A. K., Nock, R., & Qu, L. (2017). Making deep neural networks robust to label noise: A loss correction approach. In: Proceedings of CVPR, pp 2233–2241
Rasmussen, C. (1999). The infinite gaussian mixture model. In: Proceedings of NeurIPS, pp 554–560
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In: Proceedings of ICCV, pp 779–788
Scott, C. (2012). Calibrated asymmetric surrogate losses. Electronic J of Stat, 6, 958–992.
Article MathSciNet Google Scholar
Staerman, G., Laforgue, P., Mozharovskyi, P., & d’Alché Buc, F. (2021). When ot meets mom: Robust estimation of wasserstein distance. In: Proceedings of AISTATS, pp 136–144
Sun, H., Wei, Q., Feng, L., Hu, Y., Liu, F., Fan, H., & Yin, Y. (2024). Variational rectification inference for learning with noisy labels. Int J Comput Vis, 1(18), 1–20.
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In: Proceedings of CVPR, pp 1–9
Tanaka, D., Ikami, D., Yamasaki, T., & Aizawa, K. (2018). Joint optimization framework for learning with noisy labels. In: Proceedings of CVPR, pp 5552–5560
Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Proceedings of NeurIPS, pp 1195–1204
Wang, J., Xia, X., Lan, L., Wu, X., Yu, J., Yang, W., Han, B., & Liu, T. (2024). Tackling noisy labels with network parameter additive decomposition. IEEE Trans Pattern Anal Mach Intell, 46(9), 6341–6354.
Article Google Scholar
Wang, Y., Chen, H., Heng, Q., Hou, W., Fan, Y., Wu, Z., Wang, J., Savvides, M., Shinozaki, T., Raj, B., Schiele, B., & Xie, X. (2023). Freematch: Self-adaptive thresholding for semi-supervised learning. In: Proceedings of ICLR
Wei, H., Feng, L., Chen, X., & An, B. (2020). Combating noisy labels by agreement: A joint training method with co-regularization. In: Proceedings of CVPR, pp 13723–13732
Wei, J., Liu, H., Liu, T., Niu, G., Sugiyama, M., & Liu, Y. (2022a). To smooth or not? when label smoothing meets noisy labels. In: Proceedings of ICML, pp 23589–23614
Wei, J., Zhu, Z., Cheng, H., Liu, T., Niu, G., & Liu, Y. (2022b). Learning with noisy labels revisited: A study using real-world human annotations. In: Proceedings of ICLR
Xia, X., Liu, T., Wang, N., Han, B., Gong, C., Niu, G., & Sugiyama, M. (2019). Are anchor points really indispensable in label-noise learning? In: Proceedings of NeurIPS, pp 6835–6846
Xia, X., Liu, T., Han, B., Wang, N., Gong, M., Liu, H., Niu, G., Tao, D., & Sugiyama, M. (2020). Part-dependent label noise: Towards instance-dependent label noise. In: Proceedings of NeurIPS, pp 7597–7610
Xia, X., Liu, T., Han, B., Gong, C., Wang, N., Ge, Z., & Chang, Y. (2021). Robust early-learning: Hindering the memorization of noisy labels. In: Proceedings of ICLR
Xia, X., Liu, T., Han, B., Gong, M., Yu, J., Niu, G., & Sugiyama, M. (2022). Sample selection with uncertainty of losses for learning with noisy labels. In: Proceedings of ICLR
Xia, X., Han, B., Zhan, Y., Yu, J., Gong, M., Gong, C., & Liu, T. (2023). Combating noisy labels with sample selection by mining high-discrepancy examples. In: Proceedings of ICCV, pp 1833–1843
Xiao, T., Xia, T., Yang, Y., Huang, C., & Wang, X. (2015). Learning from massive noisy labeled data for image classification. In: Proceedings of CVPR, pp 2691–2699
Xu, Y., Cao, P., Kong, Y., & Wang, Y. (2019). L_dmi: A novel information-theoretic loss function for training deep nets robust to label noise. In: Proceedings of NeurIPS, pp 6222–6233
Xu, Y., Niu, X., Yang, J., Drew, S., Zhou, J., & Chen, R. (2023). USDNL: uncertainty-based single dropout in noisy label learning. In: Proceedings of AAAI, pp 10648–10656
Yang, S., Yang, E., Han, B., Liu, Y., Xu, M., Niu, G., & Liu, T. (2022). Estimating instance-dependent bayes-label transition matrix using a deep neural network. Proceedings of ICML, 162, 25302–25312.
Google Scholar
Yang, S., Wu, S., Yang, E., Han, B., Liu, Y., Xu, M., Niu, G., & Liu, T. (2023). A parametrical model for instance-dependent label noise. IEEE Trans Pattern Anal Mach Intell, 45(12), 14055–14068.
Article Google Scholar
Yao, Y., Sun, Z., Zhang, C., Shen, F., Wu, Q., Zhang, J., & Tang, Z. (2021). Jo-SRC: A contrastive approach for combating noisy labels. In: Proceedings of CVPR, pp 5192–5201
Yi, K., & Wu, J. (2019). Probabilistic end-to-end noise correction for learning with noisy labels. In: Proceedings of CVPR, pp 7017–7025
Yi, R., Guan, D., Huang, Y., & Lu, S. (2023). Class-independent regularization for learning with noisy labels. In: Proceedings of AAAI, pp 3276–3284
Yu, X., Han, B., Yao, J., Niu, G., Tsang, I., & Sugiyama, M. (2019a). How does disagreement help generalization against label corruption? In: Proceedings of ICML, pp 7164–7173
Yu, X., Han, B., Yao, J., Niu, G., Tsang, I. W., & Sugiyama, M. (2019b). How does disagreement help generalization against label corruption? In: Proceedings of ICML, pp 7164–7173
Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2017). Understanding deep learning requires rethinking generalization. In: Proceedings of ICLR
Zhang, C. B., Jiang, P. T., Hou, Q., Wei, Y., Han, Q., Li, Z., & Cheng, M. M. (2021). Delving deep into label smoothing. IEEE Trans Image Process, 30, 5984–5996.
Article Google Scholar
Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., & Agrawal, A. (2018). Context encoding for semantic segmentation. In: Proceedings of CVPR, pp 7151–7160
Zhang, J., Song, B., Wang, H., Han, B., Liu, T., Liu, L., & Sugiyama, M. (2024). Badlabel: A robust perspective on evaluating and enhancing label-noise learning. IEEE Trans Pattern Anal Mach Intell, 46(6), 4398–4409.
Zhang, X., Zou, J., He, K., & Sun, J. (2015). Accelerating very deep convolutional networks for classification and detection. IEEE Trans Pattern Anal Mach Intell, 38(10), 1943–1955.
Article Google Scholar
Zhang, Y., Niu, G., & Sugiyama, M. (2021b). Learning noise transition matrix from only noisy labels via total variation regularization. In: Proceedings of ICML, pp 12501–12512
Zhu, Z., Liu, T., & Liu, Y. (2021). A second-order approach to learning with instance-dependent label noise. In: Proceedings of CVPR, pp 10113–10123

Download references

Funding

This work was supported in part by Macau Science and Technology Development Fund under SKLIOTSC-2021-2023, 0022/2022/A1, and 0119/2024/RIB2; in part by Research Committee at University of Macau under MYRG-GRG2023-00058-FST-UMDF; in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515012536.

Author information

Authors and Affiliations

State Key Laboratory of Internet of Things for Smart City and Department of Computer and Information Science, University of Macau, Macau, China
Fengpeng Li, Kemou Li & Jiantao Zhou
TMLR Group, Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong
Qizhou Wang & Bo Han
School of Computer Science and Engineering, Macau University of Science and Technology, Wai Long Road, Taipa, Macau, China
Jinyu Tian

Authors

Fengpeng Li
View author publications
Search author on:PubMed Google Scholar
Kemou Li
View author publications
Search author on:PubMed Google Scholar
Qizhou Wang
View author publications
Search author on:PubMed Google Scholar
Bo Han
View author publications
Search author on:PubMed Google Scholar
Jinyu Tian
View author publications
Search author on:PubMed Google Scholar
Jiantao Zhou
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jiantao Zhou.

Additional information

Communicated by Bryan Allen Plummer.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proofs

1.1 Proof of Theorem 1

We first introduce necessary lemmas that will facilitate our proof.

Lemma 1

The classifier $\hat{f}_{\tau }$ with the threshold $\tau $ can be expressed as:

$$\begin{aligned} {f}_{\tau }(x) & = \textrm{sign}(\hat{\eta }(x) - \tau )\\ & = \textrm{sign} \left( \eta (x) - \frac{\frac{1}{2}+\frac{1}{\beta }(\tau -\frac{1}{2})-\rho _{-1}}{1-\rho _{+1}-\rho _{-1}} \right) . \end{aligned}$$

Proof

By definition, we have:

$$ \begin{aligned} \tilde{\eta }(X)&= P(\tilde{Y} = 1, Y = 1 | X) + P(\tilde{Y} = 1, Y = -1 | X) \\&= P(\tilde{Y} = 1 | Y = 1) P(Y = 1 | X)\\&\quad + P(\tilde{Y} = 1 | Y = -1) P(Y = -1 | X) \\&= (1 - \rho _{+1}) \eta (X) + \rho _{-1} (1 - \eta (X)) \\&= (1 - \rho _{+1} - \rho _{-1}) \eta (X) + \rho _{-1}. \end{aligned} $$

Therefore,

$$\begin{aligned} \hat{f}_{\tau }(x) & =\textrm{sign}(\hat{\eta }(x) - \tau )\\ & =\textrm{sign}\left( \frac{1}{2}+\beta (\tilde{\eta }(x)-\frac{1}{2})-\tau \right) \\ & =\textrm{sign}\left( \tilde{\eta }(x)-(\frac{1}{2}+\frac{1}{\beta }(\tau -\frac{1}{2}))\right) \\ & =\textrm{sign}\left( (1 - \rho _{+1} - \rho _{-1}) \eta (x) + \rho _{-1}-(\frac{1}{2}+\frac{\tau -\frac{1}{2}}{\beta })\right) \\ & =\textrm{sign} \left( \eta (x) - \frac{\frac{1}{2}+\frac{1}{\beta }(\tau -\frac{1}{2})-\rho _{-1}}{1-\rho _{+1}-\rho _{-1}} \right) . \end{aligned}$$

Lemma 2

($\alpha $-weighted Bayes Optimal (Scott, 2012)) Define $U_{\alpha }$-risk under distribution D as

$$ R_{\alpha ,D}(f)=\mathbb {E}_{(X,Y) \sim D}[U_{\alpha }(f(X),Y)]. $$

Then, $f^{*}_{\alpha }=\textrm{sign}(\eta (x)-\alpha )$ minimizes $U_{\alpha }$-risk.

Now we proceed to prove Theorem 1 and Corollary 1.

Proof

By Lemma 1, the optimal threshold $\tau ^*$ is expected to satisfy:

$$ \frac{\frac{1}{2}+\frac{1}{\beta }(\tau -\frac{1}{2})-\rho _{-1}}{1-\rho _{+1}-\rho _{-1}} = \frac{1}{2}. $$

Solving this equation yields $\tau ^*=\frac{1}{2}+\frac{\beta }{2}(\rho _{-1}-\rho _{+1})$.

Applying Lemma 2, it follows directly that the $\alpha $-weighted Bayes classifier under noisy distribution is:

$$ \tilde{f}_{\alpha }^*={{\,\mathrm{arg\,min}\,}}_f R_{\alpha ,D_{\rho }}(f)=\textrm{sign}(\tilde{\eta }(x)-\alpha ). $$

Combining this with the result from Lemma 1, we observe that when

$$ \alpha =\frac{1}{2}+\frac{1}{\beta }(\tau -\frac{1}{2}), $$

it holds that $\hat{f}_{\tau }=\tilde{f}_{\alpha }^*$. By selecting the optimal threshold $\tau ^*$, we obtain $\alpha ^*=\frac{1-\rho _{+1}+\rho _{-1}}{2}$, as required.

1.2 Proof of Theorem 2

Proof

For convenience, we denote $\ell _{\textrm{RML}}(f(\varvec{x}),\tilde{y})$ by $\ell _{\textrm{RML}}$. First of all, with the regroup number n for RML++, we observe that the event $\left\{ |\ell _{\textrm{RML}}-\hat{\mu }|>\epsilon \right\} $ implies that at least $\frac{n+1}{2}$ of mean loss $\ell ^s_i$ in $\mathcal {W}^{*}$ have to be outside $\epsilon $ distance to $\hat{\mu }$. Namely,

$$ \left\{ |\ell _{\textrm{RML}}-\hat{\mu }|>\epsilon \right\} \subset \left\{ \sum ^{n+1}_{i=1} \mathbb {1}(|\ell ^s_i-\hat{\mu }|>\epsilon )\geqslant \frac{n+1}{2}\right\} . $$

Define $Z_i=\mathbb {1}(|\ell ^s_i-\hat{\mu }|>\epsilon )$. Let $p_{\epsilon ,k}=\mathbb {E}[Z_i]=P\left( |\ell ^s_i-\hat{\mu }|>\epsilon \right) $, $i=1,\dots ,n$, and $p_{\epsilon }=\mathbb {E}[Z_{n+1}]=P\left( |\ell -\hat{\mu }|>\epsilon \right) $, then we can obtain

$$\begin{aligned} & P\left( |\ell _\textrm{RML}-\hat{\mu }|>\epsilon \right) \leqslant P\left( \sum ^{n+1}_{i=1} Z_i \geqslant \frac{n+1}{2}\right) \\ & = P\left( \sum ^{n+1}_{i=1} (Z_i-\mathbb {E}[Z_i]) \geqslant \frac{n+1}{2}-n p_{\epsilon ,k}-p_{\epsilon }\right) \\ & =P\left( \frac{1}{n+1}\sum ^{n+1}_{i=1} (Z_i-\mathbb {E}[Z_i]) \geqslant \frac{1}{2}-\frac{n}{n+1}p_{\epsilon ,k}-\frac{1}{n+1}p_{\epsilon }\right) . \end{aligned}$$

Because $Z_i$ are independent random variable bounded by 1, then by Hoeffding’s inequality (one-sided), with any $\tau >0$, we can obtain

$$\begin{aligned}P\left( \frac{1}{n+1}\sum ^{n+1}_{i=1} (Z_i-\mathbb {E}[Z_i]) \geqslant \tau \right) \leqslant \textrm{e}^{-2(n+1)\tau ^2}. \end{aligned}$$

As a result,

$$\begin{aligned} & P\left( |\ell _{\textrm{RML}}-\hat{\mu }|>\epsilon \right) \\ & \leqslant P\left( \frac{1}{n+1}\sum ^{n+1}_{i=1} (Z_i-\mathbb {E}[Z_i]) \geqslant \frac{1}{2}-\frac{n}{n+1}p_{\epsilon ,k}-\frac{1}{n+1}p_{\epsilon }\right) \\ & \leqslant \textrm{e}^{-2(n+1)(\frac{1}{2}-\frac{n}{n+1}p_{\epsilon ,k}-\frac{1}{n+1}p_{\epsilon })^2}. \end{aligned}$$

Due to $\textrm{Var}_{\varvec{X^*}|\tilde{Y}=\tilde{y}}[\ell (f(\varvec{X^{*}}),\tilde{y})]=\hat{\sigma }^2<\infty $ and Chebyshev’s inequality, we can obtain

$$\begin{aligned} p_{\epsilon ,k} & =P(|\ell ^s_i-\hat{\mu }|>\epsilon )\leqslant \frac{\hat{\sigma }^2}{k\epsilon ^2} \ \text {and} \\ p_{\epsilon } & =P(|\ell -\hat{\mu }|>\epsilon )\leqslant \frac{\hat{\sigma }^2}{\epsilon ^2}. \end{aligned}$$

Then the bound is

$$ \begin{aligned} P\left( |\ell _{\textrm{RML}}-\hat{\mu }|>\epsilon \right)&\leqslant \textrm{e}^{-2(n+1)(\frac{1}{2}-\frac{n}{n+1}p_{\epsilon ,k}-\frac{1}{n+1}p_{\epsilon })^2}\\&\leqslant \textrm{e}^{-2(n+1)(\frac{1}{2}-\frac{n}{n+1}\frac{\hat{\sigma }^2}{k\epsilon ^2}-\frac{1}{n+1}\frac{\hat{\sigma }^2}{\epsilon ^2})^2}\\&=\textrm{e}^{-2(n+1)(\frac{1}{2}-\frac{n+k}{n+1}\frac{\hat{\sigma }^2}{k\epsilon ^2})^2}\\&=\textrm{e}^{-\mathrm {C_1}(\frac{1}{2}-\mathrm {C_2}\frac{\hat{\sigma }^2}{\epsilon ^2})^2}, \end{aligned} $$

where $\mathrm {C_1}=2(n+1),\mathrm {C_2}=\frac{n+k}{k(n+1)}>0.$ $\square $

Remark 2

Note that after the operation in Eqs. (1) and (5), intuitively, losses are closer to the true loss and so that number of $\ell _i^s$ outside $\epsilon $ distance to $\hat{\mu }$ decreases. Therefore, $\sum ^{n+1}_{i=1} \mathbb {1}\left( |\ell ^s_i-\hat{\mu }|>\epsilon \right) $ decreases and so does $P\left( \sum ^{n+1}_{i=1} Z_i \geqslant \frac{n+1}{2}\right) $, i.e., the upper bound in Theorem 2 is reduced.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, F., Li, K., Wang, Q. et al. RML++: Regroup Median Loss for Combating Label Noise. Int J Comput Vis 133, 6400–6421 (2025). https://doi.org/10.1007/s11263-025-02494-4

Download citation

Received: 31 December 2024
Accepted: 28 May 2025
Published: 09 June 2025
Version of record: 09 June 2025
Issue date: September 2025
DOI: https://doi.org/10.1007/s11263-025-02494-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RML++: Regroup Median Loss for Combating Label Noise

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Delving Deeper Into Clean Samples for Combating Noisy Labels

Enhancing Fairness and Robustness in Label-Noise Learning Through Advanced Sample Selection and Adversarial Optimization

Foster Adaptivity and Balance in Learning with Noisy Labels

Explore related subjects

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Proofs

Proofs

1.1 Proof of Theorem 1

Lemma 1

Proof

Lemma 2

Proof

1.2 Proof of Theorem 2

Proof

Remark 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now