Learning from Noisy Large-Scale Datasets with Minimal Supervision
@article{Veit2017LearningFN, title={Learning from Noisy Large-Scale Datasets with Minimal Supervision}, author={Andreas Veit and Neil Gordon Alldrin and Gal Chechik and Ivan Krasin and Abhinav Kumar Gupta and Serge J. Belongie}, journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2017}, pages={6575-6583} }
We present an approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations. One common approach to combine clean and noisy data is to first pre-train a network using the large noisy dataset and then fine-tune with the clean dataset. We show this approach does not fully leverage the information contained in the clean set. Thus, we demonstrate how to use the clean annotations to reduce…
Figures and Tables from this paper
335 Citations
Iterative Learning with Open-set Noisy Labels
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
A novel iterative learning framework for training CNNs on datasets with open-set noisy labels that detects noisy labels and learns deep discriminative features in an iterative fashion and designs a Siamese network to encourage clean labels and noisy labels to be dissimilar.
Learning to Learn From Noisy Labeled Data
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This work proposes a noise-tolerant training algorithm, where a meta-learning update is performed prior to conventional gradient update, and trains the model such that after one gradient update using each set of synthetic noisy labels, the model does not overfit to the specific noise.
Distilling Effective Supervision From Severe Label Noise
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This paper presents a holistic framework to train deep neural networks in a way that is highly invulnerable to label noise and achieves excellent performance on large-scale datasets with real-world label noise.
Audio Tagging by Cross Filtering Noisy Labels
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2020
This article presents a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging, and achieves state-of-the-art performance and even surpasses the ensemble models on FSDKaggle2018 dataset.
Graph convolutional networks for learning with few clean and many noisy labels
- Computer ScienceECCV
- 2020
Experimental results show that the GCN-based cleaning process significantly improves the classification accuracy over not cleaning the noisy data and standard few-shot classification where only few clean examples are used.
FINE Samples for Learning with Noisy Labels
- Computer ScienceNeurIPS
- 2021
A novel detector for filtering label noise that focuses on each data point’s latent representation dynamics and measure the alignment between the latent distribution and each representation using the eigen decomposition of the data gram matrix, providing a robust detector using derivative-free simple methods with theoretical guarantees.
Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels
- Computer ScienceICML
- 2020
This paper establishes the first benchmark of controlled real-world label noise from the web, and conducts the largest study by far into understanding deep neural networks trained on noisy labels across different noise levels, noise types, network architectures, and training settings.
Learning from Noisy Data with Robust Representation Learning
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
This work embeds images into a low-dimensional subspace, and regularize the geometric structure of the subspace with robust contrastive learning, which includes an unsupervised consistency loss and a supervised mixup prototypical loss.
Weakly Supervised Image Classification Through Noise Regularization
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
Experimental results show that the proposed approach outperforms the state-of-the-art methods, and generalizes well to both single-label and multi-label scenarios.
CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images
- Computer ScienceECCV
- 2018
It is shown by experiments that those images with highly noisy labels can surprisingly improve the generalization capability of model, by serving as a manner of regularization, resulting in a high-performance CNN the model, where the negative impact of noisy labels is reduced substantially.
References
SHOWING 1-10 OF 38 REFERENCES
Training Convolutional Networks with Noisy Labels
- Computer ScienceICLR 2014
- 2014
An extra noise layer is introduced into the network which adapts the network outputs to match the noisy label distribution and can be estimated as part of the training process and involve simple modifications to current training infrastructures for deep networks.
Learning from massive noisy labeled data for image classification
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
A general framework to train CNNs with only a limited number of clean labels and millions of easily obtained noisy labels is introduced and the relationships between images, class labels and label noises are model with a probabilistic graphical model and further integrate it into an end-to-end deep learning system.
Learning Visual Features from Large Weakly Supervised Data
- Computer ScienceECCV
- 2016
This paper trains convolutional networks on a dataset of 100 million Flickr photos and comments, and shows that these networks produce features that perform well in a range of vision problems.
Unsupervised Visual Representation Learning by Context Prediction
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
It is demonstrated that the feature representation learned using this within-image context indeed captures visual similarity across images and allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset.
Semi-Supervised Learning in Gigantic Image Collections
- Computer ScienceNIPS
- 2009
This paper uses the convergence of the eigenvectors of the normalized graph Laplacian to eigenfunctions of weighted Laplace-Beltrami operators to obtain highly efficient approximations for semi-supervised learning that are linear in the number of images.
Building high-level features using large scale unsupervised learning
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
Contrary to what appears to be a widely-held intuition, the experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not.
Webly Supervised Learning of Convolutional Networks
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
This work uses easy images to train an initial visual representation and uses this initial CNN to adapt it to harder, more realistic images by leveraging the structure of data and categories, and demonstrates the strength of webly supervised learning by localizing objects in web images and training a R-CNN style detector.
Deep Residual Learning for Image Recognition
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This paper proposes an algorithm to decouple the human reporting bias from the correct visually grounded labels, and shows significant improvements over traditional algorithms for both image classification and image captioning, doubling the performance of existing methods in some cases.
Learning with Noisy Labels
- Computer ScienceNIPS
- 2013
The problem of binary classification in the presence of random classification noise is theoretically studied—the learner sees labels that have independently been flipped with some small probability, and methods used in practice such as biased SVM and weighted logistic regression are provably noise-tolerant.