Learning from Noisy Labels with Deep Neural Networks: A Survey

@article{Song2022LearningFN,
  title={Learning from Noisy Labels with Deep Neural Networks: A Survey},
  author={Hwanjun Song and Minseok Kim and Dongmin Park and Jae-Gil Lee},
  journal={IEEE transactions on neural networks and learning systems},
  year={2022},
  volume={PP}
}
Deep learning has achieved remarkable success in numerous domains with help from large amounts of big data. However, the quality of data labels is a concern because of the lack of high-quality labels in many real-world scenarios. As noisy labels severely degrade the generalization performance of deep neural networks, learning from noisy labels (robust training) is becoming an important task in modern deep learning applications. In this survey, we first describe the problem of learning with… 

Figures and Tables from this paper

Asymmetric Loss Functions for Learning with Noisy Labels

TLDR
The asymmetry ratio is introduced to measure the asymmetry of a loss function, and empirical results show that a higher ratio would provide better noise tolerance.

Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls

TLDR
This work proposes and analyses several mitigating learning algorithms that identify trolls either at the example or at the user level, and finds that user-based methods, that take into account that troll users will exhibit adversarial behavior across multi-ple examples, work best in a variety of settings on the authors' benchmark.

A Survey on Classifying Big Data with Label Noise

TLDR
An extensive literature review on treating label noise within big data and 30 methods for treating class label noise in a range of big data contexts, i.e. high volume, high variety, and high velocity problems are presented.

Noisy Label Learning for Security Defects

TLDR
A two-stage learning method based on noise cleaning to identify and remediate the noisy samples, which improves AUC and recall of baselines by up to 8.9% and 23.4%, respectively and shows that learning from noisy labels can be effective for data-driven software and security analytics.

A Label Management Mechanism for Retinal Fundus Image Classification of Diabetic Retinopathy

TLDR
This work proposes a novel label management mechanism (LMM) for the DNN to overcome overfitting on the noisy data and demonstrates that LMM could boost performance of models and is superior to three state-of-the-art methods.

Detecting Label Errors using Pre-Trained Language Models

We show that large pre-trained language models are extremely capable of identifying label errors in datasets: simply verifying data points in descending order of out-of-distribution loss significantly

Robust Product Classification with Instance-Dependent Noise

TLDR
A simple yet effective Deep Neural Network is developed for product title classification to use as a base classifier and a novel noise stimulation algorithm based on product title similarity is proposed.

ASSIST: Towards Label Noise-Robust Dialogue State Tracking

TLDR
This paper proposes a general framework, named ASSIST (lAbel noiSe-robuSt dIalogue State Tracking), to train DST models robustly from noisy labels, and shows the validity of ASSIST theoretically.

Robustness and reliability when training with noisy labels

TLDR
Observed robustness of common training practices, such as early stopping, to label noise is explained and the development of new noise-robust algorithms that not only preserve accuracy but that also ensure reliability are encouraged.

How to distribute data across tasks for meta-learning?

TLDR
The results provide guidance for allocating labels across tasks when collecting data for meta-learning, and prove mathematically that the same results hold for few-shot image classification on CIFAR-FS and mini-ImageNet.
...

References

SHOWING 1-10 OF 189 REFERENCES

Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations

TLDR
This work presents two new benchmark datasets, which it quantitatively and qualitatively shows that real-world noisy labels follow an instance-dependent pattern rather than the classically assumed and adopted ones (e.g., class-dependent label noise), and starts an effort to benchmark a subset of the existing solutions using CIFAR-10N and CIFar-100N.

Learning From Noisy Labels by Regularized Estimation of Annotator Confusion

TLDR
This work presents a method for simultaneously learning the individual annotator model and the underlying true label distribution, using only noisy observations, and proposes to add a regularization term to the loss function that encourages convergence to the true annotator confusion matrix.

Co-teaching: Robust training of deep neural networks with extremely noisy labels

TLDR
Empirical results on noisy versions of MNIST, CIFar-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.

CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise

TLDR
CleanNet, a joint neural embedding network, which only requires a fraction of the classes being manually verified to provide the knowledge of label noise that can be transferred to other classes is introduced, which can reduce label noise detection error rate on held-out classes where no human supervision available.

Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning

TLDR
This paper introduces an intermediate class to avoid directly estimating the noisy class posterior of the transition matrix, and introduces the dual $T-estimator for estimating transition matrices, leading to better classification performances.

Symmetric Cross Entropy for Robust Learning With Noisy Labels

TLDR
The proposed Symmetric cross entropy Learning (SL) approach simultaneously addresses both the under learning and overfitting problem of CE in the presence of noisy labels, and empirically shows that SL outperforms state-of-the-art methods.

Are Anchor Points Really Indispensable in Label-Noise Learning?

TLDR
Empirical results on benchmark-simulated and real-world label-noise datasets demonstrate that without using exact anchor points, the proposed method is superior to the state-of-the-art label- noise learning methods.

Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks

TLDR
Under a rich dataset model, it is shown that gradient descent is provably robust to noise/corruption on a constant fraction of the labels despite overparameterization and shed light on the empirical robustness of deep networks as well as commonly adopted heuristics to prevent overfitting.

Training Deep Neural Networks on Noisy Labels with Bootstrapping

TLDR
A generic way to handle noisy and incomplete labeling by augmenting the prediction objective with a notion of consistency is proposed, which considers a prediction consistent if the same prediction is made given similar percepts, where the notion of similarity is between deep network features computed from the input data.

Training Convolutional Networks with Noisy Labels

TLDR
An extra noise layer is introduced into the network which adapts the network outputs to match the noisy label distribution and can be estimated as part of the training process and involve simple modifications to current training infrastructures for deep networks.
...