• Corpus ID: 208247893

Synthetic vs Real: Deep Learning on Controlled Noise

  title={Synthetic vs Real: Deep Learning on Controlled Noise},
  author={Lu Jiang and Di Huang and Weilong Yang},
Performing controlled experiments on noisy data is essential in thoroughly understanding deep learning across a spectrum of noise levels. Due to the lack of suitable datasets, previous research have only examined deep learning on controlled synthetic noise, and real-world noise has never been systematically studied in a controlled setting. To this end, this paper establishes a benchmark of real-world noisy labels at 10 controlled noise levels. As real-world noise possesses unique properties, to… 
Sharpness-Aware Minimization for Efficiently Improving Generalization
This work introduces a novel, effective procedure for simultaneously minimizing loss value and loss sharpness, Sharpness-Aware Minimization (SAM), which improves model generalization across a variety of benchmark datasets and models, yielding novel state-of-the-art performance for several.
AdvAug: Robust Adversarial Augmentation for Neural Machine Translation
The main idea is to minimize the vicinal risk over virtual sentences sampled from two vicinity distributions, in which the crucial one is a novel vicinity distribution for adversarial sentences that describes a smooth interpolated embedding space centered around observed training sentence pairs.
Image recognition from raw labels collected without annotators
This paper proposes to work without any explicitly labeled data by directly training the deep neural network on the noisy candidate labels, and early stopping the training to avoid overfitting, and shows thatTraining on the candidate examples and regularizing through early stopping gives higher test performance for both problems than when training on the original, clean data.


Learning to Learn From Noisy Labeled Data
This work proposes a noise-tolerant training algorithm, where a meta-learning update is performed prior to conventional gradient update, and trains the model such that after one gradient update using each set of synthetic noisy labels, the model does not overfit to the specific noise.
Deep Learning is Robust to Massive Label Noise
It is shown that deep neural networks are capable of generalizing from training data for which true labels are massively outnumbered by incorrect labels, and that training in this regime requires a significant but manageable increase in dataset size that is related to the factor by which correct labels have been diluted.
Training Deep Neural Networks on Noisy Labels with Bootstrapping
A generic way to handle noisy and incomplete labeling by augmenting the prediction objective with a notion of consistency is proposed, which considers a prediction consistent if the same prediction is made given similar percepts, where the notion of similarity is between deep network features computed from the input data.
Learning from Noisy Labels with Distillation
This work proposes a unified distillation framework to use “side” information, including a small clean dataset and label relations in knowledge graph, to “hedge the risk” of learning from noisy labels, and proposes a suite of new benchmark datasets to evaluate this task in Sports, Species and Artifacts domains.
Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise
It is demonstrated that robustness to label noise up to severe strengths can be achieved by using a set of trusted data with clean labels, and a loss correction that utilizes trusted examples in a data-efficient manner to mitigate the effects of label noise on deep neural network classifiers is proposed.
Learning from Noisy Large-Scale Datasets with Minimal Supervision
An approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations and is particularly effective for a large number of classes with wide range of noise in annotations.
Iterative Learning with Open-set Noisy Labels
A novel iterative learning framework for training CNNs on datasets with open-set noisy labels that detects noisy labels and learns deep discriminative features in an iterative fashion and designs a Siamese network to encourage clean labels and noisy labels to be dissimilar.
Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization
This paper interprets that the conventional training methods with regularization by noise injection optimize the lower bound of the true objective and proposes a technique to achieve a tighter lower bound using multiple noise samples per training example in a stochastic gradient descent iteration.
Training deep neural-networks using a noise adaptation layer
This study presents a neural-network approach that optimizes the same likelihood function as optimized by the EM algorithm but extended to the case where the noisy labels are dependent on the features in addition to the correct labels.
Understanding deep learning requires rethinking generalization
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.