# How Data Augmentation affects Optimization for Linear Regression

@inproceedings{Hanin2020HowDA, title={How Data Augmentation affects Optimization for Linear Regression}, author={Boris Hanin and Yi Sun}, year={2020} }

Though data augmentation has rapidly emerged as a key tool for optimization in modern machine learning, a clear picture of how augmentation schedules affect optimization and interact with optimization hyperparameters such as learning rate is nascent. In the spirit of classical convex optimization and recent work on implicit bias, the present work analyzes the effect of augmentation on optimization in the simple convex setting of linear regression with MSE loss. We find joint schedules for…

## Figures from this paper

## References

SHOWING 1-10 OF 36 REFERENCES

Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules

- Computer Science, MathematicsICML
- 2019

This paper introduces a new data augmentation algorithm, Population Based Augmentation (PBA), which generates nonstationary augmentation policy schedules instead of a fixed augmentationpolicy.

A Kernel Theory of Modern Data Augmentation

- Computer Science, MathematicsICML
- 2019

This paper provides a general model of augmentation as a Markov process, and shows that kernels appear naturally with respect to this model, even when the authors do not employ kernel classification, and analyzes more directly the effect of Augmentation on kernel classifiers.

Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate

- Computer ScienceICLR
- 2021

The theory explains several folk arts in practice used for SGD hyperparameter tuning, such as linearly scaling the initial learning rate with batch size; and overrunning SGD with high learning rate even when the loss stops decreasing.

The Penalty Imposed by Ablated Data Augmentation

- Computer Science, MathematicsArXiv
- 2020

It is proved that ablated data augmentation is equivalent to optimizing the ordinary least squares objective along with a penalty that is called the Contribution Covariance Penalty and inverted dropout, and that an empirical version of the result is demonstrated if the authors replace contributions with attributions and coefficients with average gradients.

Does Data Augmentation Lead to Positive Margin?

- Computer Science, MathematicsICML
- 2019

Lower bounds on the number of augmented data points required for non-zero margin are presented, and it is shown that commonly used DA techniques may only introduce significant margin after adding exponentially many points to the data set.

Optimization Methods for Large-Scale Machine Learning

- Computer Science, MathematicsSIAM Rev.
- 2018

A major theme of this study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter, leading to a discussion about the next generation of optimization methods for large- scale machine learning.

On the Generalization Effects of Linear Transformations in Data Augmentation

- Computer Science, MathematicsICML
- 2020

This work considers a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting, and proposes an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data.

The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning

- Computer Science, MathematicsICML
- 2018

The key observation is that most modern learning architectures are over-parametrized and are trained to interpolate the data by driving the empirical loss close to zero, so it is still unclear why these interpolated solutions perform well on test data.

The Implicit Bias of Gradient Descent on Separable Data

- Mathematics, Computer ScienceJ. Mach. Learn. Res.
- 2018

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the…

Invariance reduces Variance: Understanding Data Augmentation in Deep Learning and Beyond

- Mathematics, Computer ScienceArXiv
- 2019

A theoretical framework to start to shed light on how data augmentation could be used in problems with symmetry where other approaches are prevalent, such as in cryo-electron microscopy (cryo-EM).