A Survey on Deep Learning with Noisy Labels: How to train your model when you cannot trust on the annotations?

  title={A Survey on Deep Learning with Noisy Labels: How to train your model when you cannot trust on the annotations?},
  author={Filipe R. Cordeiro and G. Carneiro},
  journal={2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)},
  • F. Cordeiro, G. Carneiro
  • Published 1 November 2020
  • Computer Science
  • 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)
Noisy Labels are commonly present in data sets automatically collected from the internet, mislabeled by non-specialist annotators, or even specialists in a challenging task, such as in the medical field. Although deep learning models have shown significant improvements in different domains, an open issue is their ability to memorize noisy labels during training, reducing their generalization potential. As deep learning models depend on correctly labeled data sets and label correctness is… 

Figures and Tables from this paper

Class-conditional Importance Weighting for Deep Learning with Noisy Labels
This paper extends the existing Contrast to Divide algorithm coupled with DivideMix using a new class-conditional weighted scheme and proposes a loss correction method that relies on dynamic weights computed based on the model training.
Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks
Surprisingly, it is found that lower capacity models may be practically more useful than higher capacity models in real-world datasets with high proportions of erroneously labeled data.
TrustNet: Learning from Trusted Data Against (A)symmetric Label Noise
This paper designs TrustNet that first learns the pattern of noise corruption, being it both symmetric or asymmetric, from a small set of trusted data, and is trained via a robust loss function, which weights the given labels against the inferred labels from the learned noise pattern.
Transfer and Marginalize: Explaining Away Label Noise with Privileged Information
It is argued that privileged information is useful for explaining away label noise, thereby reducing the harmful impact of noisy labels and developing a simple and efficient method, TRAM (TRansfer and Marginalize), which has minimal training time overhead and has the same test time cost as not using privileged information.
The Fault in Our Data Stars: Studying Mitigation Techniques against Faulty Training Data in Machine Learning Applications
It is found that ensemble learning offers the highest resilience among all the techniques across different configurations, followed by label smoothing.
Optical Remote Sensing Image Understanding with Weak Supervision: Concepts, Methods, and Perspectives
—In recent years, supervised learning has been widely used in various tasks of optical remote sensing image understanding, including remote sensing image classification, pixel- wise segmentation,
AI Total: Analyzing Security ML Models with Imperfect Data in Production
A web-based visualization system that allows the users to quickly gather headline performance numbers while maintaining confidence that the underlying data pipeline is functioning properly, and a novel way to analyze performance under data issues using a data coverage equalizer.
A Practical Overview of Safety Concerns and Mitigation Methods for Visual Deep Learning Algorithms
An in-depth look at the underlying cause of faults in a visual deep learning algorithm is provided to find a practical and complete safety concern list with potential state-of-the-art mitiga- methods.
How to Find Actionable Static Analysis Warnings
Automatically generated static code warnings suffer from a large number of false alarms. Hence, developers only take action on a small percent of those warnings. To better predict which static code


Learning to Learn From Noisy Labeled Data
This work proposes a noise-tolerant training algorithm, where a meta-learning update is performed prior to conventional gradient update, and trains the model such that after one gradient update using each set of synthetic noisy labels, the model does not overfit to the specific noise.
Iterative Cross Learning on Noisy Labels
This work proposes a novel and simple training strategy, Iterative Cross Learning (ICL), that significantly improves the classification accuracy of neural networks with training data that has noisy labels, and can also easily be combined with other existing methods that address noise labels, improving their performance.
Learning from Noisy Labels with Deep Neural Networks
A novel way of modifying deep learning models so they can be effectively trained on data with high level of label noise is proposed, and it is shown that random images without labels can improve the classification performance.
Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels
This paper finds that the test accuracy can be quantitatively characterized in terms of the noise ratio in datasets, and adopts the Co-teaching strategy which takes full advantage of the identified samples to train DNNs robustly against noisy labels.
Iterative Learning with Open-set Noisy Labels
A novel iterative learning framework for training CNNs on datasets with open-set noisy labels that detects noisy labels and learns deep discriminative features in an iterative fashion and designs a Siamese network to encourage clean labels and noisy labels to be dissimilar.
SELF: Learning to Filter Noisy Labels with Self-Ensembling
This work presents a simple and effective method self-ensemble label filtering (SELF) to progressively filter out the wrong labels during training that substantially outperforms all previous works on noise-aware learning across different datasets and can be applied to a broad set of network architectures.
DivideMix: Learning with Noisy Labels as Semi-supervised Learning
This work proposes DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques, which models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples.
Learning Not to Learn in the Presence of Noisy Labels
It is shown that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption, resulting in a simple and effective method to improve robustness and generalization.
Training Deep Neural Networks on Noisy Labels with Bootstrapping
A generic way to handle noisy and incomplete labeling by augmenting the prediction objective with a notion of consistency is proposed, which considers a prediction consistent if the same prediction is made given similar percepts, where the notion of similarity is between deep network features computed from the input data.
Deep Self-Learning From Noisy Labels
This work presents a novel deep self-learning framework to train a robust network on the real noisy datasets without extra supervision, which is effective and efficient and outperforms its counterparts in all empirical settings.