• Corpus ID: 249097375

Mitigating Memorization of Noisy Labels via Regularization between Representations

  title={Mitigating Memorization of Noisy Labels via Regularization between Representations},
  author={Hao Cheng and Zhaowei Zhu and Xing Sun and Yang Liu},
Designing robust loss functions is popular in learning with noisy labels while existing designs did not explicitly consider the overfitting property of deep neural networks (DNNs). As a result, applying these losses may still suffer from overfitting/memorizing noisy labels as training proceeds. In this paper, we first theoretically analyze the memorization effect and show that a lower-capacity model may perform better on noisy datasets. However, it is non-trivial to design a neural network with the… 

Figures and Tables from this paper


Relational Knowledge Distillation
RKD allows students to outperform their teachers' performance, achieving the state of the arts on standard benchmark datasets and proposes distance-wise and angle-wise distillation losses that penalize structural differences in relations.
Co-teaching: Robust training of deep neural networks with extremely noisy labels
Empirical results on noisy versions of MNIST, CIFar-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.
Classification with Noisy Labels by Importance Reweighting
  • Tongliang Liu, D. Tao
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2016
It is proved that any surrogate loss function can be used for classification with noisy labels by using importance reweighting, with consistency assurance that the label noise does not ultimately hinder the search for the optimal classifier of the noise-free sample.
Learning with Instance-Dependent Label Noise: A Sample Sieve Approach
This paper proposes CORES^2 (COnfidence REgularized Sample Sieve), which progressively sieves out corrupted samples and provides a generic machinery for anatomizing noisy datasets and a flexible interface for various robust training techniques to further improve the performance.
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
A Second-Order Approach to Learning with Instance-Dependent Label Noise
This work proposes and studies the potentials of a second-order approach that leverages the estimation of several covariance terms defined between the instance-dependent noise rates and the Bayes optimal label, and shows that this set of second- order statistics successfully captures the induced imbalances.
Deep Learning From Multiple Noisy Annotators as A Union.
This article proposes a novel method named UnionNet, which takes all the labeling information as a union and coordinates multiple annotators, which can directly train an end-to-end deep neural network by maximizing the likelihood of this union with only a parametric transition matrix.
Robust Training under Label Noise by Over-parameterization
This work proposes a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted, and demonstrates state-of-the-art test accuracy against label noise on a variety of real datasets.
Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations
This work presents two new benchmark datasets, which it quantitatively and qualitatively shows that real-world noisy labels follow an instance-dependent pattern rather than the classically assumed and adopted ones (e.g., class-dependent label noise), and starts an effort to benchmark a subset of the existing solutions using CIFAR-10N and CIFar-100N.
Robust early-learning: Hindering the memorization of noisy labels
The memorization effects of deep networks show that they will first memorize training data with clean labels and then those with noisy labels. The early stopping method therefore can be exploited for