这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

RML++: Regroup Median Loss for Combating Label Noise

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Training deep neural networks (DNNs) typically necessitates large-scale, high-quality annotated datasets. However, due to the inherent challenges of precisely annotating vast numbers of training samples, label noise—characterized by potentially erroneous annotations—is common yet detrimental in practice. Currently, to combat the negative impacts of label noise, mainstream studies follow a pipeline that begins with data sampling and is followed by loss correction. Data sampling aims to partition the original training dataset into clean and noisy subsets, but it often suffers from biased sampling that can mislead models. Additionally, loss correction typically requires knowledge of the noise rate as a priori information, of which the precise estimation can be challenging. To this end, we propose a novel method, Regroup Median Loss Plus Plus (RML++), that addresses both of the previous drawbacks. Specifically, the training dataset is partitioned into clean and noisy subsets using a newly designed separation approach, which synergistically combines prediction consistency with an adaptive threshold to ensure a reliable sampling. Moreover, to ensure the noisy subsets can be robustly learned by models, we suggest to estimate the losses of noisy training samples by utilizing the same-class samples from the clean subset. Subsequently, the proposed method corrects the labels of noisy samples based on the model predictions with the regularization of RML++. Compared to state-of-the-art (SOTA) methods, RML++ achieves significant improvements on both synthetic and challenging real-world datasets. The source code is available at https://github.com/Feng-peng-Li/RML-Extension.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Arazo, E., Ortego, D., Albert, P., O’Connor, N. E., & McGuinness, K. (2019). Unsupervised label noise modeling and loss correction. In: Proceedings of ICML, pp 312–321

  • Arpit, D., Jastrzebski, S., Ballas, N., Krueger, D., Bengio, E., Kanwal, M. S., Maharaj, T., Fischer, A., Courville, A., Bengio, Y., & Lacoste-Julien, S. (2017). A closer look at memorization in deep networks. In: Proceedings of ICML, pp 233–242

  • Bai, Y., Yang, E., Han, B., Yang, Y., Li, J., Mao, Y., Niu, G., & Liu, T. (2021). Understanding and improving early stopping for learning with noisy labels. In: Proceedings of NeurIPS, pp 24392–24403

  • Berthelot, D., Carlini, N., Goodfellow, I.J., Papernot, N., Oliver, A., & Raffel, C. (2019). Mixmatch: A holistic approach to semi-supervised learning. In: Proceedings of NeurIPS, pp 5050–5060

  • Bossard, L., Guillaumin, M., & Gool, L. V. (2014). Food-101 - mining discriminative components with random forests. In: Proceedings of ECCV, pp 446–461

  • Cheng, H., Zhu, Z., Li, X., Gong, Y., Sun, X., & Liu, Y. (2021). Learning with instance-dependent label noise: A sample sieve approach. In: Proceedings of ICLR

  • Cui, X., Saon, G., Nagano, T., Suzuki, M., Fukuda, T., Kingsbury, B., & Kurata, G. (2022). Improving generalization of deep neural network acoustic models with length perturbation and n-best based label smoothing. In: INTERSPEECH, pp 2638–2642

  • Frénay, B., & Verleysen, M. (2013). Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst, 25(5), 845–869.

    Article  Google Scholar 

  • Girshick, R. (2015). Fast r-cnn. In: Proceedings of CVPR, pp 1440–1448

  • Gui, X., Wang, W., & Tian, Z. (2021). Towards understanding deep learning from noisy labels with small-loss criterion. In: Proceedings of IJCAI, pp 2469–2475

  • Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I. W., & Sugiyama, M. (2018). Co-teaching: Robust training of deep neural networks with extremely noisy labels. In: Proceedings of NeurIPS, pp 8536–8546

  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In: Proceedings of ICCV, pp 2961–2969

  • Iscen, A., Valmadre, J., Arnab, A., & Schmid, C. (2022). Learning with neighbor consistency for noisy labels. In: Proceedings of CVPR, pp 4662–4671

  • Karim, N., Rizve, M. N., Rahnavard, N., Mian, A., & Shah, M. (2022). Unicon: Combating label noise through uniform selection and contrastive learning. In: Proceedings of CVPR, pp 9676–9686

  • Kim, D., Ryoo, K., Cho, H., & Kim, S. (2024). Splitnet: learnable clean-noisy label splitting for learning with noisy labels. Int J Comput Vis, 1(18), 1–18.

    Google Scholar 

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In: Proceedings of NeurIPS, pp 1106–1114

  • Lecué, G., & Lerasle, M. (2020). Robust machine learning by median-of-means: Theory and practice. Ann Stat, 48(2), 906–931.

    Article  MathSciNet  Google Scholar 

  • Lee, K., He, X., Zhang, L., & Yang, L. (2018). Cleannet: Transfer learning for scalable image classifier training with label noise. In: Proceedings of CVPR, pp 5447–5456

  • Li, F., Li, K., Tian, J., & Zhou, J. (2024). Regroup median loss for combating label noise. In: Proceedings of AAAI, pp 13474–13482

  • Li. J., Wong, Y., Zhao, Q., & Kankanhalli, M. S. (2019). Learning to learn from noisy labeled data. In: Proceedings of CVPR, pp 5051–5059

  • Li, J., Socher, R., Hoi, S. C. (2020). Dividemix: Learning with noisy labels as semi-supervised learning. In: Proceedings of ICLR, pp 824–832

  • Li, S., Xia, X., Ge, S., & Liu, T. (2022a). Selective-supervised contrastive learning with noisy labels. In: Proceedings of CVPR, pp 316–325

  • Li, S., Xia, X., Zhang, H., Zhan, Y., Ge, S., & Liu, T. (2022b). Estimating noise transition matrix with label correlations for noisy multi-label learning. In: Proceedings of NeurIPS

  • Li, S., Xia, X., Zhang, H., Ge, S., & Liu, T. (2023a). Multi-label noise transition matrix estimation with label correlations: Theory and algorithm. CoRR arXiv:2309.12706

  • Li, W., Wang, L., Li. W., Agustsson, E., & Van Gool, L. (2017). Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862

  • Li. X., Liu, T., Han, B., Niu, G., & Sugiyama, M. (2021). Provably end-to-end label-noise learning without anchor points. In: Proceedings of ICML, pp 6403–6413

  • Li, Y., Han, H., Shan, S., & Chen, X. (2023b). DISC: learning from noisy labels via dynamic instance-specific selection and correction. In: Proceedings of CVPR, pp 24070–24079

  • Liu, S., Niles-Weed, J., Razavian, N., & Fernandez-Granda, C. (2020). Early-learning regularization prevents memorization of noisy labels. In: Proceedings of NeurIPS, pp 20331–20342

  • Liu, T., & Tao, D. (2016). Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell, 38(3), 447–461.

    Article  Google Scholar 

  • Liu, Y., & Guo, H. (2020). Peer loss functions: Learning from noisy labels without knowing noise rates. In: Proceedings of ICML, pp 6226–6236

  • Müller, R., Kornblith, S., & Hinton, G. E. (2019). When does label smoothing help? In: Proceedings of NeurIPS, pp 4696–4705

  • Natarajan, N., Dhillon, I. S., Ravikumar, P., & Tewari, A. (2013). Learning with noisy labels. In: Proceedings of NeurIPS, pp 1196–1204

  • Ortego, D., Arazo, E., Albert, P., O’Connor, N. E., & McGuinness, K. (2021). Multi-objective interpolation training for robustness to label noise. In: Proceedings of CVPR, pp 6606–6615

  • Patrini, G., Rozza, A., Menon, A. K., Nock, R., & Qu, L. (2017). Making deep neural networks robust to label noise: A loss correction approach. In: Proceedings of CVPR, pp 2233–2241

  • Rasmussen, C. (1999). The infinite gaussian mixture model. In: Proceedings of NeurIPS, pp 554–560

  • Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In: Proceedings of ICCV, pp 779–788

  • Scott, C. (2012). Calibrated asymmetric surrogate losses. Electronic J of Stat, 6, 958–992.

    Article  MathSciNet  Google Scholar 

  • Staerman, G., Laforgue, P., Mozharovskyi, P., & d’Alché Buc, F. (2021). When ot meets mom: Robust estimation of wasserstein distance. In: Proceedings of AISTATS, pp 136–144

  • Sun, H., Wei, Q., Feng, L., Hu, Y., Liu, F., Fan, H., & Yin, Y. (2024). Variational rectification inference for learning with noisy labels. Int J Comput Vis, 1(18), 1–20.

    Google Scholar 

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In: Proceedings of CVPR, pp 1–9

  • Tanaka, D., Ikami, D., Yamasaki, T., & Aizawa, K. (2018). Joint optimization framework for learning with noisy labels. In: Proceedings of CVPR, pp 5552–5560

  • Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Proceedings of NeurIPS, pp 1195–1204

  • Wang, J., Xia, X., Lan, L., Wu, X., Yu, J., Yang, W., Han, B., & Liu, T. (2024). Tackling noisy labels with network parameter additive decomposition. IEEE Trans Pattern Anal Mach Intell, 46(9), 6341–6354.

    Article  Google Scholar 

  • Wang, Y., Chen, H., Heng, Q., Hou, W., Fan, Y., Wu, Z., Wang, J., Savvides, M., Shinozaki, T., Raj, B., Schiele, B., & Xie, X. (2023). Freematch: Self-adaptive thresholding for semi-supervised learning. In: Proceedings of ICLR

  • Wei, H., Feng, L., Chen, X., & An, B. (2020). Combating noisy labels by agreement: A joint training method with co-regularization. In: Proceedings of CVPR, pp 13723–13732

  • Wei, J., Liu, H., Liu, T., Niu, G., Sugiyama, M., & Liu, Y. (2022a). To smooth or not? when label smoothing meets noisy labels. In: Proceedings of ICML, pp 23589–23614

  • Wei, J., Zhu, Z., Cheng, H., Liu, T., Niu, G., & Liu, Y. (2022b). Learning with noisy labels revisited: A study using real-world human annotations. In: Proceedings of ICLR

  • Xia, X., Liu, T., Wang, N., Han, B., Gong, C., Niu, G., & Sugiyama, M. (2019). Are anchor points really indispensable in label-noise learning? In: Proceedings of NeurIPS, pp 6835–6846

  • Xia, X., Liu, T., Han, B., Wang, N., Gong, M., Liu, H., Niu, G., Tao, D., & Sugiyama, M. (2020). Part-dependent label noise: Towards instance-dependent label noise. In: Proceedings of NeurIPS, pp 7597–7610

  • Xia, X., Liu, T., Han, B., Gong, C., Wang, N., Ge, Z., & Chang, Y. (2021). Robust early-learning: Hindering the memorization of noisy labels. In: Proceedings of ICLR

  • Xia, X., Liu, T., Han, B., Gong, M., Yu, J., Niu, G., & Sugiyama, M. (2022). Sample selection with uncertainty of losses for learning with noisy labels. In: Proceedings of ICLR

  • Xia, X., Han, B., Zhan, Y., Yu, J., Gong, M., Gong, C., & Liu, T. (2023). Combating noisy labels with sample selection by mining high-discrepancy examples. In: Proceedings of ICCV, pp 1833–1843

  • Xiao, T., Xia, T., Yang, Y., Huang, C., & Wang, X. (2015). Learning from massive noisy labeled data for image classification. In: Proceedings of CVPR, pp 2691–2699

  • Xu, Y., Cao, P., Kong, Y., & Wang, Y. (2019). L_dmi: A novel information-theoretic loss function for training deep nets robust to label noise. In: Proceedings of NeurIPS, pp 6222–6233

  • Xu, Y., Niu, X., Yang, J., Drew, S., Zhou, J., & Chen, R. (2023). USDNL: uncertainty-based single dropout in noisy label learning. In: Proceedings of AAAI, pp 10648–10656

  • Yang, S., Yang, E., Han, B., Liu, Y., Xu, M., Niu, G., & Liu, T. (2022). Estimating instance-dependent bayes-label transition matrix using a deep neural network. Proceedings of ICML, 162, 25302–25312.

    Google Scholar 

  • Yang, S., Wu, S., Yang, E., Han, B., Liu, Y., Xu, M., Niu, G., & Liu, T. (2023). A parametrical model for instance-dependent label noise. IEEE Trans Pattern Anal Mach Intell, 45(12), 14055–14068.

    Article  Google Scholar 

  • Yao, Y., Sun, Z., Zhang, C., Shen, F., Wu, Q., Zhang, J., & Tang, Z. (2021). Jo-SRC: A contrastive approach for combating noisy labels. In: Proceedings of CVPR, pp 5192–5201

  • Yi, K., & Wu, J. (2019). Probabilistic end-to-end noise correction for learning with noisy labels. In: Proceedings of CVPR, pp 7017–7025

  • Yi, R., Guan, D., Huang, Y., & Lu, S. (2023). Class-independent regularization for learning with noisy labels. In: Proceedings of AAAI, pp 3276–3284

  • Yu, X., Han, B., Yao, J., Niu, G., Tsang, I., & Sugiyama, M. (2019a). How does disagreement help generalization against label corruption? In: Proceedings of ICML, pp 7164–7173

  • Yu, X., Han, B., Yao, J., Niu, G., Tsang, I. W., & Sugiyama, M. (2019b). How does disagreement help generalization against label corruption? In: Proceedings of ICML, pp 7164–7173

  • Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2017). Understanding deep learning requires rethinking generalization. In: Proceedings of ICLR

  • Zhang, C. B., Jiang, P. T., Hou, Q., Wei, Y., Han, Q., Li, Z., & Cheng, M. M. (2021). Delving deep into label smoothing. IEEE Trans Image Process, 30, 5984–5996.

    Article  Google Scholar 

  • Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., & Agrawal, A. (2018). Context encoding for semantic segmentation. In: Proceedings of CVPR, pp 7151–7160

  • Zhang, J., Song, B., Wang, H., Han, B., Liu, T., Liu, L., & Sugiyama, M. (2024). Badlabel: A robust perspective on evaluating and enhancing label-noise learning. IEEE Trans Pattern Anal Mach Intell, 46(6), 4398–4409.

  • Zhang, X., Zou, J., He, K., & Sun, J. (2015). Accelerating very deep convolutional networks for classification and detection. IEEE Trans Pattern Anal Mach Intell, 38(10), 1943–1955.

    Article  Google Scholar 

  • Zhang, Y., Niu, G., & Sugiyama, M. (2021b). Learning noise transition matrix from only noisy labels via total variation regularization. In: Proceedings of ICML, pp 12501–12512

  • Zhu, Z., Liu, T., & Liu, Y. (2021). A second-order approach to learning with instance-dependent label noise. In: Proceedings of CVPR, pp 10113–10123

Download references

Funding

This work was supported in part by Macau Science and Technology Development Fund under SKLIOTSC-2021-2023, 0022/2022/A1, and 0119/2024/RIB2; in part by Research Committee at University of Macau under MYRG-GRG2023-00058-FST-UMDF; in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515012536.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiantao Zhou.

Additional information

Communicated by Bryan Allen Plummer.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proofs

Proofs

1.1 Proof of Theorem 1

We first introduce necessary lemmas that will facilitate our proof.

Lemma 1

The classifier \(\hat{f}_{\tau }\) with the threshold \(\tau \) can be expressed as:

$$\begin{aligned} {f}_{\tau }(x) & = \textrm{sign}(\hat{\eta }(x) - \tau )\\ & = \textrm{sign} \left( \eta (x) - \frac{\frac{1}{2}+\frac{1}{\beta }(\tau -\frac{1}{2})-\rho _{-1}}{1-\rho _{+1}-\rho _{-1}} \right) . \end{aligned}$$

Proof

By definition, we have:

$$ \begin{aligned} \tilde{\eta }(X)&= P(\tilde{Y} = 1, Y = 1 | X) + P(\tilde{Y} = 1, Y = -1 | X) \\&= P(\tilde{Y} = 1 | Y = 1) P(Y = 1 | X)\\&\quad + P(\tilde{Y} = 1 | Y = -1) P(Y = -1 | X) \\&= (1 - \rho _{+1}) \eta (X) + \rho _{-1} (1 - \eta (X)) \\&= (1 - \rho _{+1} - \rho _{-1}) \eta (X) + \rho _{-1}. \end{aligned} $$

Therefore,

$$\begin{aligned} \hat{f}_{\tau }(x) & =\textrm{sign}(\hat{\eta }(x) - \tau )\\ & =\textrm{sign}\left( \frac{1}{2}+\beta (\tilde{\eta }(x)-\frac{1}{2})-\tau \right) \\ & =\textrm{sign}\left( \tilde{\eta }(x)-(\frac{1}{2}+\frac{1}{\beta }(\tau -\frac{1}{2}))\right) \\ & =\textrm{sign}\left( (1 - \rho _{+1} - \rho _{-1}) \eta (x) + \rho _{-1}-(\frac{1}{2}+\frac{\tau -\frac{1}{2}}{\beta })\right) \\ & =\textrm{sign} \left( \eta (x) - \frac{\frac{1}{2}+\frac{1}{\beta }(\tau -\frac{1}{2})-\rho _{-1}}{1-\rho _{+1}-\rho _{-1}} \right) . \end{aligned}$$

Lemma 2

(\(\alpha \)-weighted Bayes Optimal (Scott, 2012)) Define \(U_{\alpha }\)-risk under distribution D as

$$ R_{\alpha ,D}(f)=\mathbb {E}_{(X,Y) \sim D}[U_{\alpha }(f(X),Y)]. $$

Then, \(f^{*}_{\alpha }=\textrm{sign}(\eta (x)-\alpha )\) minimizes \(U_{\alpha }\)-risk.

Now we proceed to prove Theorem 1 and Corollary 1.

Proof

By Lemma 1, the optimal threshold \(\tau ^*\) is expected to satisfy:

$$ \frac{\frac{1}{2}+\frac{1}{\beta }(\tau -\frac{1}{2})-\rho _{-1}}{1-\rho _{+1}-\rho _{-1}} = \frac{1}{2}. $$

Solving this equation yields \(\tau ^*=\frac{1}{2}+\frac{\beta }{2}(\rho _{-1}-\rho _{+1})\).

Applying Lemma 2, it follows directly that the \(\alpha \)-weighted Bayes classifier under noisy distribution is:

$$ \tilde{f}_{\alpha }^*={{\,\mathrm{arg\,min}\,}}_f R_{\alpha ,D_{\rho }}(f)=\textrm{sign}(\tilde{\eta }(x)-\alpha ). $$

Combining this with the result from Lemma 1, we observe that when

$$ \alpha =\frac{1}{2}+\frac{1}{\beta }(\tau -\frac{1}{2}), $$

it holds that \(\hat{f}_{\tau }=\tilde{f}_{\alpha }^*\). By selecting the optimal threshold \(\tau ^*\), we obtain \(\alpha ^*=\frac{1-\rho _{+1}+\rho _{-1}}{2}\), as required.

1.2 Proof of Theorem 2

Proof

For convenience, we denote \(\ell _{\textrm{RML}}(f(\varvec{x}),\tilde{y})\) by \(\ell _{\textrm{RML}}\). First of all, with the regroup number n for RML++, we observe that the event \(\left\{ |\ell _{\textrm{RML}}-\hat{\mu }|>\epsilon \right\} \) implies that at least \(\frac{n+1}{2}\) of mean loss \(\ell ^s_i\) in \(\mathcal {W}^{*}\) have to be outside \(\epsilon \) distance to \(\hat{\mu }\). Namely,

$$ \left\{ |\ell _{\textrm{RML}}-\hat{\mu }|>\epsilon \right\} \subset \left\{ \sum ^{n+1}_{i=1} \mathbb {1}(|\ell ^s_i-\hat{\mu }|>\epsilon )\geqslant \frac{n+1}{2}\right\} . $$

Define \(Z_i=\mathbb {1}(|\ell ^s_i-\hat{\mu }|>\epsilon )\). Let \(p_{\epsilon ,k}=\mathbb {E}[Z_i]=P\left( |\ell ^s_i-\hat{\mu }|>\epsilon \right) \), \(i=1,\dots ,n\), and \(p_{\epsilon }=\mathbb {E}[Z_{n+1}]=P\left( |\ell -\hat{\mu }|>\epsilon \right) \), then we can obtain

$$\begin{aligned} & P\left( |\ell _\textrm{RML}-\hat{\mu }|>\epsilon \right) \leqslant P\left( \sum ^{n+1}_{i=1} Z_i \geqslant \frac{n+1}{2}\right) \\ & = P\left( \sum ^{n+1}_{i=1} (Z_i-\mathbb {E}[Z_i]) \geqslant \frac{n+1}{2}-n p_{\epsilon ,k}-p_{\epsilon }\right) \\ & =P\left( \frac{1}{n+1}\sum ^{n+1}_{i=1} (Z_i-\mathbb {E}[Z_i]) \geqslant \frac{1}{2}-\frac{n}{n+1}p_{\epsilon ,k}-\frac{1}{n+1}p_{\epsilon }\right) . \end{aligned}$$

Because \(Z_i\) are independent random variable bounded by 1, then by Hoeffding’s inequality (one-sided), with any \(\tau >0\), we can obtain

$$\begin{aligned}P\left( \frac{1}{n+1}\sum ^{n+1}_{i=1} (Z_i-\mathbb {E}[Z_i]) \geqslant \tau \right) \leqslant \textrm{e}^{-2(n+1)\tau ^2}. \end{aligned}$$

As a result,

$$\begin{aligned} & P\left( |\ell _{\textrm{RML}}-\hat{\mu }|>\epsilon \right) \\ & \leqslant P\left( \frac{1}{n+1}\sum ^{n+1}_{i=1} (Z_i-\mathbb {E}[Z_i]) \geqslant \frac{1}{2}-\frac{n}{n+1}p_{\epsilon ,k}-\frac{1}{n+1}p_{\epsilon }\right) \\ & \leqslant \textrm{e}^{-2(n+1)(\frac{1}{2}-\frac{n}{n+1}p_{\epsilon ,k}-\frac{1}{n+1}p_{\epsilon })^2}. \end{aligned}$$

Due to \(\textrm{Var}_{\varvec{X^*}|\tilde{Y}=\tilde{y}}[\ell (f(\varvec{X^{*}}),\tilde{y})]=\hat{\sigma }^2<\infty \) and Chebyshev’s inequality, we can obtain

$$\begin{aligned} p_{\epsilon ,k} & =P(|\ell ^s_i-\hat{\mu }|>\epsilon )\leqslant \frac{\hat{\sigma }^2}{k\epsilon ^2} \ \text {and} \\ p_{\epsilon } & =P(|\ell -\hat{\mu }|>\epsilon )\leqslant \frac{\hat{\sigma }^2}{\epsilon ^2}. \end{aligned}$$

Then the bound is

$$ \begin{aligned} P\left( |\ell _{\textrm{RML}}-\hat{\mu }|>\epsilon \right)&\leqslant \textrm{e}^{-2(n+1)(\frac{1}{2}-\frac{n}{n+1}p_{\epsilon ,k}-\frac{1}{n+1}p_{\epsilon })^2}\\&\leqslant \textrm{e}^{-2(n+1)(\frac{1}{2}-\frac{n}{n+1}\frac{\hat{\sigma }^2}{k\epsilon ^2}-\frac{1}{n+1}\frac{\hat{\sigma }^2}{\epsilon ^2})^2}\\&=\textrm{e}^{-2(n+1)(\frac{1}{2}-\frac{n+k}{n+1}\frac{\hat{\sigma }^2}{k\epsilon ^2})^2}\\&=\textrm{e}^{-\mathrm {C_1}(\frac{1}{2}-\mathrm {C_2}\frac{\hat{\sigma }^2}{\epsilon ^2})^2}, \end{aligned} $$

where \(\mathrm {C_1}=2(n+1),\mathrm {C_2}=\frac{n+k}{k(n+1)}>0.\) \(\square \)

Remark 2

Note that after the operation in Eqs. (1) and  (5), intuitively, losses are closer to the true loss and so that number of \(\ell _i^s\) outside \(\epsilon \) distance to \(\hat{\mu }\) decreases. Therefore, \(\sum ^{n+1}_{i=1} \mathbb {1}\left( |\ell ^s_i-\hat{\mu }|>\epsilon \right) \) decreases and so does \(P\left( \sum ^{n+1}_{i=1} Z_i \geqslant \frac{n+1}{2}\right) \), i.e., the upper bound in Theorem 2 is reduced.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, F., Li, K., Wang, Q. et al. RML++: Regroup Median Loss for Combating Label Noise. Int J Comput Vis 133, 6400–6421 (2025). https://doi.org/10.1007/s11263-025-02494-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-025-02494-4

Keywords