Revisiting Unreasonable Effectiveness of Data in Deep Learning Era

@article{Sun2017RevisitingUE,
  title={Revisiting Unreasonable Effectiveness of Data in Deep Learning Era},
  author={Chen Sun and Abhinav Shrivastava and Saurabh Singh and Abhinav Kumar Gupta},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={843-852}
}
The success of deep learning in vision can be attributed to: (a) models with high capacity; (b) increased computational power; and (c) availability of large-scale labeled data. Since 2012, there have been significant advances in representation capabilities of the models and computational capabilities of GPUs. But the size of the biggest dataset has surprisingly remained constant. What will happen if we increase the dataset size by 10 × or 100 × ? This paper takes a step towards clearing the… 
On The State of Data In Computer Vision: Human Annotations Remain Indispensable for Developing Deep Learning Models
TLDR
The findings are that research on optimization for deep learning focuses on perfecting the training routine and thus making DL models less data hungry, while research on synthetic datasets aims to offset the cost of data labeling.
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations
TLDR
This work describes how to generate a dataset with over a billion images via large weakly-supervised pretraining to improve the performance of these visual representations, and leverage Transformers to replace the traditional convolutional backbone, with insights into both system and performance improvements, especially at 1B+ image scale.
Semantic Redundancies in Image-Classification Datasets: The 10% You Don't Need
TLDR
This work wonders if for common benchmark datasets the authors can do better than random subsets of the data and find a subset that can generalize on par with the full dataset when trained on, and observes semantic correlations between required and redundant images.
Hybrid BYOL-ViT: Efficient approach to deal with small datasets
TLDR
This paper investigates how self-supervision with strong and sufficient augmentation of unlabeled data can train effectively the first layers of a neural network even better than supervised learning, with no need for millions of labeled data.
Large Minibatch Training on Supercomputers with Improved Accuracy and Reduced Time to Train
TLDR
The challenges and novel solutions needed in order to train ResNet-50 in a large scale environment are described and the novel Collapsed Ensemble technique is introduced that allows us to obtain a 77.5 percent top-l accuracy, similar to that of a Res net-152, while training a unmodified Res Net-50 topology for the same fixed training budget.
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
TLDR
The challenges and novel solutions needed in order to train ResNet-50 in this large scale environment are described and the novel Collapsed Ensemble (CE) technique is introduced that allows for a 77.5\% top-1 accuracy, similar to that of a Res net-152, while training a unmodified Res Net-50 topology for the same fixed training budget.
Exploring the Limits of Weakly Supervised Pretraining
TLDR
This paper presents a unique study of transfer learning with large convolutional networks trained to predict hashtags on billions of social media images and shows improvements on several image classification and object detection tasks, and reports the highest ImageNet-1k single-crop, top-1 accuracy to date.
Deep Learning is Robust to Massive Label Noise
TLDR
It is shown that deep neural networks are capable of generalizing from training data for which true labels are massively outnumbered by incorrect labels, and that training in this regime requires a significant but manageable increase in dataset size that is related to the factor by which correct labels have been diluted.
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
TLDR
A systematic empirical study finds that the combination of increased compute and AugReg can yield models with the same performance as models trained on an order of magnitude more training data.
The Reasonable Effectiveness of Synthetic Visual Data
The recent successes in many visual recognition tasks, such as image classification, object detection, and semantic segmentation can be attributed in large part to three factors: (i) advances in
...
...

References

SHOWING 1-10 OF 44 REFERENCES
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to
What makes ImageNet good for transfer learning?
TLDR
The overall findings suggest that most changes in the choice of pre-training data long thought to be critical do not significantly affect transfer performance.
Going deeper with convolutions
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition
Large-Scale Deep Learning on the YFCC100M Dataset
We present a work-in-progress snapshot of learning with a 15 billion parameter deep learning network on HPC architectures applied to the largest publicly available natural image and video dataset
Unbiased look at dataset bias
TLDR
A comparison study using a set of popular datasets, evaluated based on a number of criteria including: relative data bias, cross-dataset generalization, effects of closed-world assumption, and sample value is presented.
Large Scale Distributed Deep Networks
TLDR
This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.
The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition
TLDR
This work introduces an alternative approach, leveraging free, noisy data from the web and simple, generic methods of recognition, and demonstrates its efficacy on four fine-grained datasets, greatly exceeding existing state of the art without the manual collection of even a single label.
Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks
TLDR
This work designs a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset, and shows that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification.
...
...