• Corpus ID: 239998775

How Data Augmentation affects Optimization for Linear Regression

  title={How Data Augmentation affects Optimization for Linear Regression},
  author={Boris Hanin and Yi Sun},
  • B. Hanin, Yi Sun
  • Published 21 October 2020
  • Computer Science, Mathematics
Though data augmentation has rapidly emerged as a key tool for optimization in modern machine learning, a clear picture of how augmentation schedules affect optimization and interact with optimization hyperparameters such as learning rate is nascent. In the spirit of classical convex optimization and recent work on implicit bias, the present work analyzes the effect of augmentation on optimization in the simple convex setting of linear regression with MSE loss. We find joint schedules for… 

Figures from this paper


Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules
This paper introduces a new data augmentation algorithm, Population Based Augmentation (PBA), which generates nonstationary augmentation policy schedules instead of a fixed augmentationpolicy.
A Kernel Theory of Modern Data Augmentation
This paper provides a general model of augmentation as a Markov process, and shows that kernels appear naturally with respect to this model, even when the authors do not employ kernel classification, and analyzes more directly the effect of Augmentation on kernel classifiers.
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate
The theory explains several folk arts in practice used for SGD hyperparameter tuning, such as linearly scaling the initial learning rate with batch size; and overrunning SGD with high learning rate even when the loss stops decreasing.
The Penalty Imposed by Ablated Data Augmentation
It is proved that ablated data augmentation is equivalent to optimizing the ordinary least squares objective along with a penalty that is called the Contribution Covariance Penalty and inverted dropout, and that an empirical version of the result is demonstrated if the authors replace contributions with attributions and coefficients with average gradients.
Does Data Augmentation Lead to Positive Margin?
Lower bounds on the number of augmented data points required for non-zero margin are presented, and it is shown that commonly used DA techniques may only introduce significant margin after adding exponentially many points to the data set.
Optimization Methods for Large-Scale Machine Learning
A major theme of this study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter, leading to a discussion about the next generation of optimization methods for large- scale machine learning.
On the Generalization Effects of Linear Transformations in Data Augmentation
This work considers a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting, and proposes an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data.
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
The key observation is that most modern learning architectures are over-parametrized and are trained to interpolate the data by driving the empirical loss close to zero, so it is still unclear why these interpolated solutions perform well on test data.
The Implicit Bias of Gradient Descent on Separable Data
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the
Invariance reduces Variance: Understanding Data Augmentation in Deep Learning and Beyond
A theoretical framework to start to shed light on how data augmentation could be used in problems with symmetry where other approaches are prevalent, such as in cryo-electron microscopy (cryo-EM).