# Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates

@inproceedings{Salim2019StochasticPL, title={Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates}, author={Adil Salim and D. Kovalev and Peter Richt{\'a}rik}, booktitle={NeurIPS}, year={2019} }

We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution. Our method is a generalization of the Langevin algorithm to potentials expressed as the sum of one stochastic smooth term and multiple stochastic nonsmooth terms. In each iteration, our splitting technique only requires access to a stochastic gradient of the smooth term and a stochastic proximal operator for each of the nonsmooth terms. We establish nonasymptotic sublinear…

## 10 Citations

Penalized Langevin dynamics with vanishing penalty for smooth and log-concave targets

- MathematicsNeurIPS
- 2020

An upper bound on the Wasserstein-2 distance between the distribution of the PLD at time $t$ and the target is established and a new nonasymptotic guarantee of convergence of the penalized gradient flow for the optimization problem is inferred.

Proximal Langevin Algorithm: Rapid Convergence Under Isoperimetry

- Computer ScienceArXiv
- 2019

A convergence guarantee for PLA in Kullback-Leibler (KL) divergence when $\nu$ satisfies log-Sobolev inequality (LSI) and $f$ has bounded second and third derivatives is proved.

Wasserstein Proximal Gradient

- Computer Science
- 2020

This work adopts a Forward Backward (FB) Euler scheme for the discretization of the gradient flow of the relative entropy, and provides a closed form formula for the proximity operator of the entropy.

Convergence Error Analysis of Reflected Gradient Langevin Dynamics for Globally Optimizing Non-Convex Constrained Problems

- Computer Science, Mathematics
- 2022

This work analyzes reflected gradient Langevin dynamics (RGLD), a global optimization algorithm for smoothly constrained problems, including non-convex constrained ones, and derives a convergence rate to a solution with ǫ-sampling error that is faster than the one given by Lamperski (2021) for convex constrained cases.

Gradient-Based Markov Chain Monte Carlo for Bayesian Inference With Non-differentiable Priors

- Mathematics
- 2021

Piecewise-Deterministic Markov Processes can be utilized for exact posterior inference from distributions satisfying almost everywhere differentiability, and suggested PDMP-based samplers place no assumptions on the prior shape, nor require access to a computationally cheap proximal operator, and consequently have a much broader scope of application.

Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization

- Computer ScienceArXiv
- 2020

It is proved that moving along the geodesic in the direction of functional gradient with respect to the second-order Wasserstein distance is equivalent to applying a pushforward mapping to a probability distribution, which can be approximated accurately by pushing a set of particles.

Asymptotically Exact Data Augmentation: Models, Properties, and Algorithms

- Computer Science, MathematicsJ. Comput. Graph. Stat.
- 2021

A unified framework, coined asymptotically exact data augmentation (AXDA), which encompasses both well-established and more recent approximate augmented models is studied, which shows that AXDA models can benefit from interesting statistical properties and yield efficient inference algorithms.

Structured Logconcave Sampling with a Restricted Gaussian Oracle

- Computer Science, MathematicsCOLT
- 2021

A reduction framework is developed, inspired by proximal point methods in convex optimization, which bootstraps samplers for regularized densities to improve dependences on problem conditioning and gives algorithms for sampling several structured logconcave families to high accuracy.

Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm

- Computer Science, MathematicsNeurIPS
- 2020

A strong duality result is established for this minimization problem of sampling with respect to a log concave probability distribution and it is shown that if the potential is strongly convex, the complexity of PSGLA is $\cO(1/\varepsilon^2$ in terms of the 2-Wasserstein distance".

The Wasserstein Proximal Gradient Algorithm

- Computer ScienceNeurIPS
- 2020

This work proposes a Forward Backward (FB) discretization scheme that can tackle the case where the objective function is the sum of a smooth and a nonsmooth geodesically convex terms and shows under mild assumptions that the FB scheme has convergence guarantees similar to the proximal gradient algorithm in Euclidean spaces.

## References

SHOWING 1-10 OF 59 REFERENCES

Langevin Monte Carlo and JKO splitting

- Computer ScienceCOLT
- 2018

This work shows that a proximal version of the Unadjusted Langevin Algorithm corresponds to a scheme that alternates between solving the gradient flows of two specific functionals on the space of probability measures.

On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo

- Computer ScienceICML
- 2018

These methods are analyzed under a uniform set of assumptions on the log-posterior distribution, assuming it to be smooth, strongly convex and Hessian Lipschitz, to provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods.

Analysis of Langevin Monte Carlo via Convex Optimization

- Computer ScienceJ. Mach. Learn. Res.
- 2019

It is shown that the Unadjusted Langevin Algorithm can be formulated as a first order optimization algorithm of an objective functional defined on the Wasserstein space of order $2$ and a non-asymptotic analysis of this method to sample from logconcave smooth target distribution is given.

Stochastic Three-Composite Convex Minimization

- Computer Science, MathematicsNIPS
- 2016

This work proves the convergence characterization of the proposed algorithm in expectation under the standard assumptions for the stochastic gradient estimate of the smooth term of the sum of three convex functions.

Nonasymptotic convergence of stochastic proximal point algorithms for constrained convex optimization

- Computer Science, Mathematics
- 2017

This work introduces a new variant of the stochastic proximal point method (SPP) for solving Stochastic convex optimization problems subject to (in)finite intersection of constraints satisfying a linear regularity type condition and proves new nonasymptotic convergence results for convex and Lipschitz continuous objective functions.

Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization

- Mathematics, Computer ScienceJ. Mach. Learn. Res.
- 2017

This work introduces a new variant of the SPP method for solving stochastic convex problems subject to (in)finite intersection of constraints satisfying a linear regularity condition, and proves new nonasymptotic convergence results for convex Lipschitz continuous objective functions.

Bayesian Learning via Stochastic Gradient Langevin Dynamics

- Computer ScienceICML
- 2011

In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic…

Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm

- Mathematics, Computer Science
- 2015

For both constant and decreasing step sizes in the Euler discretization, non-asymptotic bounds for the convergence to the target distribution $\pi$ in total variation distance are obtained.

Is There an Analog of Nesterov Acceleration for MCMC?

- Computer ScienceArXiv
- 2019

We formulate gradient-based Markov chain Monte Carlo (MCMC) sampling as optimization on the space of probability measures, with Kullback-Leibler (KL) divergence as the objective functional. We show…

On sampling from a log-concave density using kinetic Langevin diffusions

- Computer ScienceArXiv
- 2018

It is proved that the geometric mixing property of the kinetic Langevin diffusion with a mixing rate that is, in the overdamped regime, optimal in terms of its dependence on the condition number is optimal.