• Corpus ID: 211132676

Learning Not to Learn in the Presence of Noisy Labels

  title={Learning Not to Learn in the Presence of Noisy Labels},
  author={Liu Ziyin and Blair Chen and Ru Wang and Paul Pu Liang and Ruslan Salakhutdinov and Louis-Philippe Morency and Masahito Ueda},
Learning in the presence of label noise is a challenging yet important task: it is crucial to design models that are robust in the presence of mislabeled datasets. In this paper, we discover that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption. We show that training with this loss function encourages the model to "abstain" from learning on the data points with noisy labels, resulting in a simple and effective… 

Figures and Tables from this paper

Learning to Combat Noisy Labels via Classification Margins

MARVEL (MARgins Via Early Learning), where the goodness of “fit” for every instance is tracked by maintaining an epoch-history of its classification margins, where MARVEL outperforms other baselines consistently across different noise levels, with a significantly larger margin under asymmetric noise.

A Survey on Deep Learning with Noisy Labels: How to train your model when you cannot trust on the annotations?

  • F. CordeiroG. Carneiro
  • Computer Science
    2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)
  • 2020
A survey on the main techniques in literature to improve the training of deep learning models in the presence of noisy labels is presented, in which the algorithm is classified in the following groups: robust losses, sample weighting, sample selection, meta-learning, and combined approaches.

An Investigation of how Label Smoothing Affects Generalization

A theoretical framework is proposed to show how label smoothing provides in controlling the generalization loss, and shows that this benefit can be precisely formulated and identified in the label noise setting, where the training is partially mislabeled.

Distributional Generalization: A New Kind of Generalization

We introduce a new notion of generalization -- Distributional Generalization -- which roughly states that outputs of a classifier at train and test time are close *as distributions*, as opposed to

Dropout can Simulate Exponential Number of Models for Sample Selection Techniques

Not only is it more convenient to use a single model with Dropout, but this approach also combines the natural benefits of Dropout with that of training an exponential number of models, leading to improved results.

Classification Under Human Assistance

It is demonstrated that, under human assistance, supervised learning models trained to operate under different automation levels can outperform those trained for full automation as well as humans operating alone.

Volumization as a Natural Generalization of Weight Decay

It is proved, on a toy example, that the essence of this method is a regularization technique to control bias-variance tradeoff, and might lead to a simple method for training a neural network whose weight is binary or ternary.

Differentiable Learning Under Triage

This work starts by for-mally characterizing under which circumstances a predictive model may benefit from algorithmic triage, and introduces a practical gradient-based algorithm that is guaranteed to guarantee a sequence of predictive models and triage policies of increasing performance.

Reinforcement Learning Under Algorithmic Triage

Extensive simulation experiments in a synthetic car driving task show that the machine models and the triage policies trained using the two-stage actor-critic method effectively complement human policies and outperform those provided by several competitive baselines.

Think Locally, Act Globally: Federated Learning with Local and Global Representations

A new federated learning algorithm is proposed that jointly learns compact local representations on each device and a global model across all devices, which helps to keep device data private and enable communication-efficient training while retaining performance.



How does Disagreement Help Generalization against Label Corruption?

A robust learning paradigm called Co-teaching+, which bridges the "Update by Disagreement" strategy with the original Co-Teaching, which is much superior to many state-of-the-art methods in the robustness of trained models.

L_DMI: An Information-theoretic Noise-robust Loss Function

A novel information-theoretic loss function, Determinant based Mutual Information (DMI), is proposed for training deep neural networks robust to label noise and empirically shows that using it outperforms all other counterparts in the classification task on both image dataset and natural language dataset.

Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee

This paper proposes and analyzes two simple and intuitive regularization methods and proves that gradient descent training with either of these two methods leads to a generalization guarantee on the clean data distribution despite being trained using noisy labels.

Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks

Under a rich dataset model, it is shown that gradient descent is provably robust to noise/corruption on a constant fraction of the labels despite overparameterization and shed light on the empirical robustness of deep networks as well as commonly adopted heuristics to prevent overfitting.

Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

A semi-supervised learning method where two neural networks are trained in a multi-task fashion: a "target network" and a "confidence network", which is optimized to perform a given task and is trained using a large set of unlabeled data that are weakly annotated.

Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach

It is proved that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise, and it is shown how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and providing an end-to-end framework.

Understanding Generalization of Deep Neural Networks Trained with Noisy Labels

Two simple regularization methods are proposed that are related to kernel ridge regression with respect to the NTK and prove their generalization guarantee on the true data distribution despite being trained using noisy labels.

Understanding deep learning requires rethinking generalization

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

SGD on Neural Networks Learns Functions of Increasing Complexity

Key to the work is a new measure of how well one classifier explains the performance of another, based on conditional mutual information, which can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.