Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert

@inproceedings{Lee2022StatisticalIW,
  title={Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert},
  author={Yoonhyung Lee and Sungdong Lee and Joong-Ho Won},
  booktitle={ICML},
  year={2022}
}
The implicit stochastic gradient descent (ISGD), a proximal version of SGD, is gaining interest in the literature due to its stability over (explicit) SGD. In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model parameters. Specifically, we derive nonasymptotic point estimation error bounds of both proxRM and proxPR… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 25 REFERENCES

The proximal Robbins–Monro method

The need for statistical estimation with large data sets has reinvigorated interest in iterative procedures and stochastic optimization. Stochastic approximations are at the forefront of this recent

Statistical inference for model parameters in stochastic gradient descent

TLDR
This work investigates the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain smoothness conditions, and proposes two consistent estimators of the asymptotic covariance of the average iterate from SGD.

Towards Stability and Optimality in Stochastic Gradient Descent

TLDR
A new iterative procedure termed averaged implicit SGD (AI-SGD), which employs an implicit update at each iteration, which is related to proximal operators in optimization and achieves competitive performance with other state-of-the-art procedures.

Statistical analysis of stochastic gradient methods for generalized linear models

TLDR
This work develops a computationally efficient algorithm to implement implicit SGD learning of GLMs and obtains exact formulas for the bias and variance of both updates which leads to important observations on their comparative statistical properties.

Statistical inference using SGD

TLDR
A novel method for frequentist statistical inference in M-estimation problems, based on stochastic gradient descent with a fixed step size is presented, and it is demonstrated that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.

Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning

TLDR
This work provides a non-asymptotic analysis of the convergence of two well-known algorithms, stochastic gradient descent as well as a simple modification where iterates are averaged, suggesting that a learning rate proportional to the inverse of the number of iterations, while leading to the optimal convergence rate, is not robust to the lack of strong convexity or the setting of the proportionality constant.

Statistical inference for the population landscape via moment‐adjusted stochastic gradients

TLDR
The moment adjusting idea motivated from ‘error standardization’ in statistics achieves a similar effect to acceleration in first‐order optimization methods that are used to fit generalized linear models.

Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization

TLDR
This work introduces a new variant of the SPP method for solving stochastic convex problems subject to (in)finite intersection of constraints satisfying a linear regularity condition, and proves new nonasymptotic convergence results for convex Lipschitz continuous objective functions.

Asymptotic and finite-sample properties of estimators based on stochastic gradients

TLDR
The theoretical analysis provides the first full characterization of the asymptotic behavior of both standard and implicit stochastic gradient descent-based estimators, including finite-sample error bounds, and suggests that implicit stochy gradient descent procedures are poised to become a workhorse for approximate inference from large data sets.

Stochastic (Approximate) Proximal Point Methods: Convergence, Optimality, and Adaptivity

We develop model-based methods for solving stochastic convex optimization problems, introducing the approximate-proximal point, or aProx, family, which includes stochastic subgradient, proximal