# Kalman Gradient Descent: Adaptive Variance Reduction in Stochastic Optimization

@article{Vuckovic2018KalmanGD, title={Kalman Gradient Descent: Adaptive Variance Reduction in Stochastic Optimization}, author={James Vuckovic}, journal={ArXiv}, year={2018}, volume={abs/1810.12273} }

We introduce Kalman Gradient Descent, a stochastic optimization algorithm that uses Kalman filtering to adaptively reduce gradient variance in stochastic gradient descent by filtering the gradient estimates. We present both a theoretical analysis of convergence in a non-convex setting and experimental results which demonstrate improved performance on a variety of machine learning areas including neural networks and black box variational inference. We also present a distributed version of our…

## 9 Citations

### KOALA: A Kalman Optimization Algorithm with Loss Adaptivity

- Computer ScienceAAAI
- 2022

The Kalman Filter dynamical model for the evolution of the unknown parameters can be used to capture the gradient dynamics of advanced methods such as Momentum and Adam and is called KOALA, which is an easy to implement, scalable, and efficient method to train neural networks.

### A Probabilistic Incremental Proximal Gradient Method

- Computer ScienceIEEE Signal Processing Letters
- 2019

The PIPG algorithm takes the form of Bayesian filtering updates for a state-space model constructed by using the cost function, which makes it possible to utilize well-known exact or approximate Bayesian filters to solve large-scale regularized optimization problems.

### A Latent Variational Framework for Stochastic Optimization

- Computer ScienceNeurIPS
- 2019

This framework establishes a direct connection between stochastic optimization algorithms and a secondary Bayesian inference problem on gradients, where a prior measure on noisy gradient observations determines the resulting algorithm.

### KaFiStO: A Kalman Filtering Framework for Stochastic Optimization

- Computer ScienceArXiv
- 2021

The Kalman Filter dynamical model for the evolution of the unknown parameters can be used to capture the gradient dynamics of advanced methods such as Momentum and Adam and is called KaFiStO, an easy to implement, scalable, and efficient method to train neural networks.

### Fast Stochastic Kalman Gradient Descent for Reinforcement Learning

- Computer ScienceL4DC
- 2021

A randomized regularization technique called Stochastic Kalman Gradient Descent (SKGD) is introduced that, combined with a low rank update, generates a sequence of feasible iterates that is suitable for large scale optimization of non-linear function approximators.

### Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

- Computer ScienceICBINB@NeurIPS
- 2020

It is proved that the model-based procedure converges in the noisy quadratic setting and can match the performance of well-tuned optimizers and ultimately, this is an interesting step for constructing self-tuning optimizers.

### An enhanced learning algorithm with a particle filter-based gradient descent optimizer method

- Computer ScienceNeural Computing and Applications
- 2020

A particle filter-based gradient descent (PF-GD) optimizer that can determine the global minimum with excellent performance is obtained that performs much better than the conventional gradient descent optimizer, although it has some parameters that must be set before modeling.

### Kalman meets Bellman: Improving Policy Evaluation through Value Tracking

- Computer ScienceArXiv
- 2020

An optimization method, called Kalman Optimization for Value Approximation (KOVA) that can be incorporated as a policy evaluation component in policy optimization algorithms and analyzed, which minimizes a regularized objective function that concerns both parameter and noisy return uncertainties.

### Trust Region Value Optimization using Kalman Filtering

- Computer ScienceArXiv
- 2019

This work presents a novel optimization method, the Kalman Optimization for Value Approximation (KOVA), based on the Extended Kalman Filter, which minimizes the regularized objective function by adopting a Bayesian perspective over both the value parameters and noisy observed returns.

## References

SHOWING 1-10 OF 34 REFERENCES

### Kalman filtering in stochastic gradient algorithms: construction of a stopping rule

- Computer Science2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
- 2004

It is shown how a simple Kalman filter can be used to estimate the gradient, with some associated confidence, and thus construct a stopping rule for the algorithm, and is illustrated by a simple example.

### Variance Reduction for Stochastic Gradient Optimization

- Computer ScienceNIPS
- 2013

This paper demonstrates how to construct the control variate for two practical problems using stochastic gradient optimization, one is convex—the MAP estimation for logistic regression, and the other is non-converage—stochastic variational inference for latent Dirichlet allocation.

### Smoothed Gradients for Stochastic Variational Inference

- Computer ScienceNIPS
- 2014

This paper replaces the natural gradient with a similarly constructed vector that uses a fixed-window moving average of some of its previous terms that enjoys significant variance reduction over the unbiased estimates, smaller bias than averaged gradients, and leads to smaller mean-squared error against the full gradient.

### Local Gain Adaptation in Stochastic Gradient Descent

- Computer Science
- 1999

The limitations of this approach are discussed, and an alternative is developed by extending Sutton''s work on linear systems to the general, nonlinear case, and the resulting online algorithms are computationally little more expensive than other acceleration techniques, and do not assume statistical independence between successive training patterns.

### Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference

- Computer Science, MathematicsNIPS
- 2017

We propose a simple and general variant of the standard reparameterized gradient estimator for the variational evidence lower bound. Specifically, we remove a part of the total derivative with…

### Kalman-Based Stochastic Gradient Method with Stop Condition and Insensitivity to Conditioning

- Computer ScienceSIAM J. Optim.
- 2016

This work introduces and analyzes a second order proximal/SGD method based on Kalman filtering, Kalman-based stochastic gradient descent (kSGD), and shows kSGD is asymptotically optimal, and develops a fast algorithm for very large composite objective functions.

### Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

- Computer ScienceJ. Mach. Learn. Res.
- 2011

This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

### Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

- Computer ScienceICLR
- 2018

This work introduces a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables, and gives an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.

### An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule

- Computer ScienceSIAM J. Optim.
- 1998

We consider an incremental gradient method with momentum term for minimizing the sum of continuously differentiable functions. This method uses a new adaptive stepsize rule that decreases the…

### The Incremental Proximal Method: A Probabilistic Perspective

- Mathematics, Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018

This work shows that the proximal operators coincide, and hence can be realized with, Bayes updates, and argues that the extended Kalman filter can provide a systematic way for the derivation of practical procedures.