# Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent

@article{Luo2022ImplicitRP,
title={Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent},
author={Yiling Luo and Xiaoming Huo and Yajun Mei},
journal={2022 IEEE International Symposium on Information Theory (ISIT)},
year={2022},
pages={696-701}
}
• Published 29 April 2022
• Mathematics, Computer Science
• 2022 IEEE International Symposium on Information Theory (ISIT)
In machine learning and statistical data analysis, we often run into objective function that is a summation: the number of terms in the summation possibly is equal to the sample size, which can be enormous. In such a setting, the stochastic mirror descent (SMD) algorithm is a numerically efficient method—each iteration involving a very small subset of the data. The variance reduction version of SMD (VRSMD) can further improve SMD by inducing faster convergence. On the other hand, algorithms…

## References

SHOWING 1-10 OF 23 REFERENCES

### The Statistical Complexity of Early Stopped Mirror Descent

• Computer Science
NeurIPS
• 2020
The theory is applied to recover, in a clean and elegant manner via rather short proofs, some of the recent results in the implicit regularization literature, while also showing how to improve upon them in some settings.

### Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization

• Computer Science
ICLR
• 2019
It is argued how this identity can be used in the so-called "highly over-parameterized" nonlinear setting to provide insights into why SMD (and SGD) may have similar convergence and implicit regularization properties for deep learning.

### Exact expressions for double descent and implicit regularization via surrogate random design

• Computer Science, Mathematics
NeurIPS
• 2020
This work provides the first exact non-asymptotic expressions for double descent of the minimum norm linear estimator and introduces a new mathematical tool of independent interest: the class of random matrices for which determinant commutes with expectation.

### On the Origin of Implicit Regularization in Stochastic Gradient Descent

• Computer Science
ICLR
• 2021
It is proved that for SGD with random shuffling, the mean SGD iterate also stays close to the path of gradient flow if the learning rate is small and finite, but on a modified loss.

### Implicit regularization via hadamard product over-parametrization in high-dimensional linear regression

• Mathematics, Computer Science
• 2019
It is shown that under certain conditions, this over-parametrization leads to implicit regularization: if the authors directly apply gradient descent to the residual sum of squares with sufficiently small initial values, then under proper early stopping rule, the iterates converge to a nearly sparse rate-optimal solution with relatively better accuracy than explicit regularized approaches.

### The Implicit Regularization of Stochastic Gradient Flow for Least Squares

• Mathematics, Computer Science
ICML
• 2020