• Corpus ID: 53097415

Kalman Gradient Descent: Adaptive Variance Reduction in Stochastic Optimization

  title={Kalman Gradient Descent: Adaptive Variance Reduction in Stochastic Optimization},
  author={James Vuckovic},
We introduce Kalman Gradient Descent, a stochastic optimization algorithm that uses Kalman filtering to adaptively reduce gradient variance in stochastic gradient descent by filtering the gradient estimates. We present both a theoretical analysis of convergence in a non-convex setting and experimental results which demonstrate improved performance on a variety of machine learning areas including neural networks and black box variational inference. We also present a distributed version of our… 

Figures from this paper

KOALA: A Kalman Optimization Algorithm with Loss Adaptivity

The Kalman Filter dynamical model for the evolution of the unknown parameters can be used to capture the gradient dynamics of advanced methods such as Momentum and Adam and is called KOALA, which is an easy to implement, scalable, and efficient method to train neural networks.

A Probabilistic Incremental Proximal Gradient Method

The PIPG algorithm takes the form of Bayesian filtering updates for a state-space model constructed by using the cost function, which makes it possible to utilize well-known exact or approximate Bayesian filters to solve large-scale regularized optimization problems.

A Latent Variational Framework for Stochastic Optimization

This framework establishes a direct connection between stochastic optimization algorithms and a secondary Bayesian inference problem on gradients, where a prior measure on noisy gradient observations determines the resulting algorithm.

KaFiStO: A Kalman Filtering Framework for Stochastic Optimization

The Kalman Filter dynamical model for the evolution of the unknown parameters can be used to capture the gradient dynamics of advanced methods such as Momentum and Adam and is called KaFiStO, an easy to implement, scalable, and efficient method to train neural networks.

Fast Stochastic Kalman Gradient Descent for Reinforcement Learning

A randomized regularization technique called Stochastic Kalman Gradient Descent (SKGD) is introduced that, combined with a low rank update, generates a sequence of feasible iterates that is suitable for large scale optimization of non-linear function approximators.

Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

It is proved that the model-based procedure converges in the noisy quadratic setting and can match the performance of well-tuned optimizers and ultimately, this is an interesting step for constructing self-tuning optimizers.

An enhanced learning algorithm with a particle filter-based gradient descent optimizer method

A particle filter-based gradient descent (PF-GD) optimizer that can determine the global minimum with excellent performance is obtained that performs much better than the conventional gradient descent optimizer, although it has some parameters that must be set before modeling.

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking

An optimization method, called Kalman Optimization for Value Approximation (KOVA) that can be incorporated as a policy evaluation component in policy optimization algorithms and analyzed, which minimizes a regularized objective function that concerns both parameter and noisy return uncertainties.

Trust Region Value Optimization using Kalman Filtering

This work presents a novel optimization method, the Kalman Optimization for Value Approximation (KOVA), based on the Extended Kalman Filter, which minimizes the regularized objective function by adopting a Bayesian perspective over both the value parameters and noisy observed returns.



Kalman filtering in stochastic gradient algorithms: construction of a stopping rule

It is shown how a simple Kalman filter can be used to estimate the gradient, with some associated confidence, and thus construct a stopping rule for the algorithm, and is illustrated by a simple example.

Variance Reduction for Stochastic Gradient Optimization

This paper demonstrates how to construct the control variate for two practical problems using stochastic gradient optimization, one is convex—the MAP estimation for logistic regression, and the other is non-converage—stochastic variational inference for latent Dirichlet allocation.

Smoothed Gradients for Stochastic Variational Inference

This paper replaces the natural gradient with a similarly constructed vector that uses a fixed-window moving average of some of its previous terms that enjoys significant variance reduction over the unbiased estimates, smaller bias than averaged gradients, and leads to smaller mean-squared error against the full gradient.

Local Gain Adaptation in Stochastic Gradient Descent

The limitations of this approach are discussed, and an alternative is developed by extending Sutton''s work on linear systems to the general, nonlinear case, and the resulting online algorithms are computationally little more expensive than other acceleration techniques, and do not assume statistical independence between successive training patterns.

Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference

We propose a simple and general variant of the standard reparameterized gradient estimator for the variational evidence lower bound. Specifically, we remove a part of the total derivative with

Kalman-Based Stochastic Gradient Method with Stop Condition and Insensitivity to Conditioning

  • V. Patel
  • Computer Science
    SIAM J. Optim.
  • 2016
This work introduces and analyzes a second order proximal/SGD method based on Kalman filtering, Kalman-based stochastic gradient descent (kSGD), and shows kSGD is asymptotically optimal, and develops a fast algorithm for very large composite objective functions.

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

This work introduces a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables, and gives an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.

An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule

  • P. Tseng
  • Computer Science
    SIAM J. Optim.
  • 1998
We consider an incremental gradient method with momentum term for minimizing the sum of continuously differentiable functions. This method uses a new adaptive stepsize rule that decreases the

The Incremental Proximal Method: A Probabilistic Perspective

This work shows that the proximal operators coincide, and hence can be realized with, Bayes updates, and argues that the extended Kalman filter can provide a systematic way for the derivation of practical procedures.