• Corpus ID: 239998775

How Data Augmentation affects Optimization for Linear Regression

@inproceedings{Hanin2020HowDA,
  title={How Data Augmentation affects Optimization for Linear Regression},
  author={Boris Hanin and Yi Sun},
  year={2020}
}
  • B. Hanin, Yi Sun
  • Published 21 October 2020
  • Computer Science, Mathematics
Though data augmentation has rapidly emerged as a key tool for optimization in modern machine learning, a clear picture of how augmentation schedules affect optimization and interact with optimization hyperparameters such as learning rate is nascent. In the spirit of classical convex optimization and recent work on implicit bias, the present work analyzes the effect of augmentation on optimization in the simple convex setting of linear regression with MSE loss. We find joint schedules for… 

Figures from this paper

References

SHOWING 1-10 OF 36 REFERENCES
Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules
TLDR
This paper introduces a new data augmentation algorithm, Population Based Augmentation (PBA), which generates nonstationary augmentation policy schedules instead of a fixed augmentationpolicy.
A Kernel Theory of Modern Data Augmentation
TLDR
This paper provides a general model of augmentation as a Markov process, and shows that kernels appear naturally with respect to this model, even when the authors do not employ kernel classification, and analyzes more directly the effect of Augmentation on kernel classifiers.
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate
TLDR
The theory explains several folk arts in practice used for SGD hyperparameter tuning, such as linearly scaling the initial learning rate with batch size; and overrunning SGD with high learning rate even when the loss stops decreasing.
The Penalty Imposed by Ablated Data Augmentation
TLDR
It is proved that ablated data augmentation is equivalent to optimizing the ordinary least squares objective along with a penalty that is called the Contribution Covariance Penalty and inverted dropout, and that an empirical version of the result is demonstrated if the authors replace contributions with attributions and coefficients with average gradients.
Does Data Augmentation Lead to Positive Margin?
TLDR
Lower bounds on the number of augmented data points required for non-zero margin are presented, and it is shown that commonly used DA techniques may only introduce significant margin after adding exponentially many points to the data set.
Optimization Methods for Large-Scale Machine Learning
TLDR
A major theme of this study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter, leading to a discussion about the next generation of optimization methods for large- scale machine learning.
On the Generalization Effects of Linear Transformations in Data Augmentation
TLDR
This work considers a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting, and proposes an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data.
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
TLDR
The key observation is that most modern learning architectures are over-parametrized and are trained to interpolate the data by driving the empirical loss close to zero, so it is still unclear why these interpolated solutions perform well on test data.
The Implicit Bias of Gradient Descent on Separable Data
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the
Invariance reduces Variance: Understanding Data Augmentation in Deep Learning and Beyond
TLDR
A theoretical framework to start to shed light on how data augmentation could be used in problems with symmetry where other approaches are prevalent, such as in cryo-electron microscopy (cryo-EM).
...
1
2
3
4
...