• Corpus ID: 224820076

Learning Curves for Analysis of Deep Networks

  title={Learning Curves for Analysis of Deep Networks},
  author={Derek Hoiem and Tanmay Gupta and Zhizhong Li and Michal Shlapentokh-Rothman},
  booktitle={International Conference on Machine Learning},
A learning curve models a classifier's test error as a function of the number of training samples. Prior works show that learning curves can be used to select model parameters and extrapolate performance. We investigate how to use learning curves to analyze the impact of design choices, such as pre-training, architecture, and data augmentation. We propose a method to robustly estimate learning curves, abstract their parameters into error and data-reliance, and evaluate the effectiveness of… 

Figures and Tables from this paper

The Shape of Learning Curves: a Review

  • T. VieringM. Loog
  • Computer Science
    IEEE transactions on pattern analysis and machine intelligence
  • 2022
This review recounts the origins of the term, provides a formal definition of the learning curve, and provides a comprehensive overview of the literature regarding the shape of learning curves.

Learning Curve Theory

This work develops and theoretically analyse the simplest possible (toy) model that can exhibit n−β learning curves for arbitrary power β > 0, and determines whether power laws are universal or depend on the data distribution.

A Meta-Learning Approach to Predicting Performance and Data Requirements

This work introduces a novel piecewise power law (PPL) that handles the two data regimes differently and introduces a random forest regressor trained via meta learning that generalizes across classification/detection tasks, ResNet/ViT based architectures, and random/pre-trained initializations.

Impact of dataset size and long-term ECoG-based BCI usage on deep learning decoders performance

The results showed that DL decoders showed similar requirements regarding the dataset size compared to the multilinear model while demonstrating higher decoding performance, suggesting motor imagery patterns improvement and patient adaptation during the long-term BCI experiment.

A Survey of Learning Curves with Bad Behavior: or How More Data Need Not Lead to Better Performance

This survey’s focus is on learning curves that show that more data does not necessarily leads to better generalization performance, a result that seems surprising to many researchers in the field of artificial intelligence.

Key concepts, common pitfalls, and best practices in artificial intelligence and machine learning: focus on radiomics

  • B. Koçak
  • Computer Science
    Diagnostic and interventional radiology
  • 2022
Key issues included in this article are validity of the scientific question, unrepresentative samples, sample size, missing data, quality of reference standard, batch effect, reliability of features, feature scaling, multi-collinearity, class imbalance, data and target leakage, high-dimensional data, optimization, overfitting, generalization, performance metrics, clinical utility, and comparison with conventional statistical and clinical methods.

Optimizing Data Collection for Machine Learning

The Learn-Optimize-Collect framework, a new paradigm for modeling the data collection work as a formal optimal data collection problem that allows designers to specify performance targets, collection costs, a time horizon, and penalties for failing to meet the targets, is proposed.

How Much More Data Do I Need? Estimating Requirements for Downstream Tasks

This work considers a broad class of computer vision tasks and systematically investigates a family of functions that generalize the power-law function to allow for better estimation of data requirements, and shows that incorporating a tuned correction factor and collecting over multiple rounds significantly improves the performance of the data estimators.

Simple Control Baselines for Evaluating Transfer Learning

This work shares an evaluation standard that aims to quantify and communicate transfer learning performance in an informative and accessible setup and encourages using/reporting the suggested control baselines in evaluating transfer learning in order to gain a more meaningful and informative understanding.

Data Scaling Laws in NMT: The Effect of Noise and Architecture

This work establishes that the test loss of encoder-decoder transformer models scales as a power law in the number of training samples, with a dependence on the model size, and systematically varies aspects of the training setup to understand how they impact the data scaling laws.



Learning Curves: Asymptotic Values and Rate of Convergence

This work proposes a practical and principled predictive method that avoids the costly procedure of training poor classifiers on the whole training set, and it is demonstrated for both single- and multi-layer networks.

Learning Multiple Layers of Features from Tiny Images

It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.

Deep Learning Scaling is Predictable, Empirically

A large scale empirical characterization of generalization error and model size growth as training sets grow is presented and it is shown that model size scales sublinearly with data size.

Revisiting Unreasonable Effectiveness of Data in Deep Learning Era

It is found that the performance on vision tasks increases logarithmically based on volume of training data size, and it is shown that representation learning (or pre-training) still holds a lot of promise.

A Constructive Prediction of the Generalization Error Across Scales

This work presents a functional form which approximates well the generalization error in practice, and shows that the form both fits the observations well across scales, and provides accurate predictions from small- to large-scale models and data.

A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Spectrally-normalized margin bounds for neural networks

This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cifar10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and that the presented bound is sensitive to this complexity.

Aggregated Residual Transformations for Deep Neural Networks

On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity.

Gradient Centralization: A New Optimization Technique for Deep Neural Networks

It is shown that GC can regularize both the weight space and output feature space so that it can boost the generalization performance of DNNs, and improves the Lipschitzness of the loss function and its gradient so that the training process becomes more efficient and stable.