# Langevin Dynamics for Inverse Reinforcement Learning of Stochastic Gradient Algorithms

@article{Krishnamurthy2020LangevinDF, title={Langevin Dynamics for Inverse Reinforcement Learning of Stochastic Gradient Algorithms}, author={Vikram Krishnamurthy and George Yin}, journal={ArXiv}, year={2020}, volume={abs/2006.11674} }

Inverse reinforcement learning (IRL) aims to estimate the reward function of optimizing agents by observing their response (estimates or actions). This paper considers IRL when noisy estimates of the gradient of a reward function generated by multiple stochastic gradient agents are observed. We present a generalized Langevin dynamics algorithm to estimate the reward function $R(\theta)$; specifically, the resulting Langevin algorithm asymptotically generates samples from the distribution…

## 4 Citations

Multikernel Passive Stochastic Gradient Algorithms and Transfer Learning

- Computer ScienceIEEE Transactions on Automatic Control
- 2022

This article develops a novel passive stochastic gradient algorithm that performs substantially better in high dimensional problems and incorporates variance reduction.

Multi-kernel Passive Stochastic Gradient Algorithms

- Computer ScienceArXiv
- 2020

This paper develops a novel passive stochastic gradient algorithm that performs substantially better in high dimensional problems and incorporates variance reduction.

Adaptive Non-reversible Stochastic Gradient Langevin Dynamics

- Computer ScienceArXiv
- 2020

A gradient algorithm to adaptively optimize the choice of the skew symmetric matrix is presented, which involves a non-reversible diffusion algorithm cross coupled with a stochastic gradient algorithm that adapts the skew asymmetric matrix.

How can a Cognitive Radar Mask its Cognition?

- Computer Science
- 2021

This paper abstracts the radar’s cognition masking problem in terms of the spectra (eigenvalues) of the state and observation noise covariance matrices, and embeds the algebraic Riccati equation into an economics-based utility maximization setup.

## References

SHOWING 1-10 OF 32 REFERENCES

Real-Time Reinforcement Learning of Constrained Markov Decision Processes with Weak Derivatives.

- Computer Science, Mathematics
- 2018

On-line policy gradient algorithms for computing the locally optimal policy of a constrained, average cost, finite state Markov Decision Process, and a novel simulation based gradient estimation scheme involving weak derivatives (measure-valued differentiation).

Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics

- Computer ScienceJ. Mach. Learn. Res.
- 2016

This article proves that, under verifiable assumptions, the SGLD algorithm is consistent, satisfies a central limit theorem (CLT) and its asymptotic bias-variance decomposition can be characterized by an explicit functional of the step-sizes sequence (δm)m≥0.

Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

- Computer Science, MathematicsCOLT
- 2017

The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.

Gradient Based Policy Optimization of Constrained Markov Decision Processes

- Mathematics, Computer Science
- 2011

This paper presents a stochastic version of a primal dual (augmented) Lagrange multiplier method for the constrained algorithm and gives an explicit expression of the asymptotic bias and discusses several possibilities for bias reduction.

Bayesian Learning via Stochastic Gradient Langevin Dynamics

- Computer ScienceICML
- 2011

In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic…

${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control

- Computer ScienceIEEE Transactions on Signal Processing
- 2007

Novel Q-learning based stochastic control algorithms for rate and power control in V-BLAST transmission systems are presented and it is shown that this algorithm converges to the optimal solution as long as the power cost estimates are asymptotically unbiased.

Regime Switching Stochastic Approximation Algorithms with Application to Adaptive Discrete Stochastic Optimization

- Mathematics, Computer ScienceSIAM J. Optim.
- 2004

By a combined use of the SA method and two-time-scale Markov chains, asymptotic properties of the algorithm are obtained, which are distinct from the usual SA techniques.

Asynchronous Stochastic Approximation Algorithms for Networked Systems: Regime-Switching Topologies and Multiscale Structure

- Computer ScienceMultiscale Model. Simul.
- 2013

This work develops asynchronous stochastic approximation algorithms for networked systems with multiagents and regime-switching topologies to achieve consensus control in an asynchronous fashion without using a global clock.

Passive stochastic approximation with constant step size and window width

- Computer ScienceIEEE Trans. Autom. Control.
- 1996

Recursion algorithms combining stochastic approximation and kernel estimation are studied and developed and it is proven that a weak convergence result holds for an interpolated sequence of the iterates.

Langevin-Type Models I: Diffusions with Given Stationary Distributions and their Discretizations*

- Computer Science, Mathematics
- 1999

We describe algorithms for estimating a given measure π known up to a constant of proportionality, based on a large class of diffusions (extending the Langevin model) for which π is invariant. We…