• Corpus ID: 224820076

Learning Curves for Analysis of Deep Networks

@inproceedings{Hoiem2021LearningCF,
  title={Learning Curves for Analysis of Deep Networks},
  author={Derek Hoiem and Tanmay Gupta and Zhizhong Li and Michal Shlapentokh-Rothman},
  booktitle={ICML},
  year={2021}
}
A learning curve models a classifier's test error as a function of the number of training samples. Prior works show that learning curves can be used to select model parameters and extrapolate performance. We investigate how to use learning curves to analyze the impact of design choices, such as pre-training, architecture, and data augmentation. We propose a method to robustly estimate learning curves, abstract their parameters into error and data-reliance, and evaluate the effectiveness of… 

Figures and Tables from this paper

The Shape of Learning Curves: a Review
TLDR
This review recounts the origins of the term, provides a formal definition of the learning curve, and provides a comprehensive overview of the literature regarding the shape of learning curves.
Learning Curve Theory
TLDR
This work develops and theoretically analyse the simplest possible (toy) model that can exhibit n−β learning curves for arbitrary power β > 0, and determines whether power laws are universal or depend on the data distribution.
Simple Control Baselines for Evaluating Transfer Learning
TLDR
This work shares an evaluation standard that aims to quantify and communicate transfer learning performance in an informative and accessible setup and encourages using/reporting the suggested control baselines in evaluating transfer learning in order to gain a more meaningful and informative understanding.
Data Scaling Laws in NMT: The Effect of Noise and Architecture
TLDR
This work establishes that the test loss of encoder-decoder transformer models scales as a power law in the number of training samples, with a dependence on the model size, and systematically varies aspects of the training setup to understand how they impact the data scaling laws.
Overview of Machine Learning Process Modelling
TLDR
Results are provided that can be used to assess the performance of novel or existing artificial learners and forecast their ‘capacity to learn’ based on the amount of available or desired data.
How Much More Data Do I Need? Estimating Requirements for Downstream Tasks
Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance? This question is of critical importance in applications

References

SHOWING 1-10 OF 45 REFERENCES
Learning Curves: Asymptotic Values and Rate of Convergence
TLDR
This work proposes a practical and principled predictive method that avoids the costly procedure of training poor classifiers on the whole training set, and it is demonstrated for both single- and multi-layer networks.
Learning Multiple Layers of Features from Tiny Images
TLDR
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.
Deep Learning Scaling is Predictable, Empirically
TLDR
A large scale empirical characterization of generalization error and model size growth as training sets grow is presented and it is shown that model size scales sublinearly with data size.
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
TLDR
It is found that the performance on vision tasks increases logarithmically based on volume of training data size, and it is shown that representation learning (or pre-training) still holds a lot of promise.
A Constructive Prediction of the Generalization Error Across Scales
TLDR
This work presents a functional form which approximates well the generalization error in practice, and shows that the form both fits the observations well across scales, and provides accurate predictions from small- to large-scale models and data.
A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark
Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Spectrally-normalized margin bounds for neural networks
TLDR
This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cifar10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and that the presented bound is sensitive to this complexity.
Aggregated Residual Transformations for Deep Neural Networks
TLDR
On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity.
Gradient Centralization: A New Optimization Technique for Deep Neural Networks
TLDR
It is shown that GC can regularize both the weight space and output feature space so that it can boost the generalization performance of DNNs, and improves the Lipschitzness of the loss function and its gradient so that the training process becomes more efficient and stable.
...
...