• Corpus ID: 219966551

# Langevin Dynamics for Inverse Reinforcement Learning of Stochastic Gradient Algorithms

@article{Krishnamurthy2020LangevinDF,
title={Langevin Dynamics for Inverse Reinforcement Learning of Stochastic Gradient Algorithms},
author={Vikram Krishnamurthy and George Yin},
journal={ArXiv},
year={2020},
volume={abs/2006.11674}
}
• Published 20 June 2020
• Computer Science
• ArXiv
Inverse reinforcement learning (IRL) aims to estimate the reward function of optimizing agents by observing their response (estimates or actions). This paper considers IRL when noisy estimates of the gradient of a reward function generated by multiple stochastic gradient agents are observed. We present a generalized Langevin dynamics algorithm to estimate the reward function $R(\theta)$; specifically, the resulting Langevin algorithm asymptotically generates samples from the distribution…
4 Citations

## Figures from this paper

Multikernel Passive Stochastic Gradient Algorithms and Transfer Learning
• Computer Science
IEEE Transactions on Automatic Control
• 2022
This article develops a novel passive stochastic gradient algorithm that performs substantially better in high dimensional problems and incorporates variance reduction.
• Computer Science
ArXiv
• 2020
This paper develops a novel passive stochastic gradient algorithm that performs substantially better in high dimensional problems and incorporates variance reduction.
• Computer Science
ArXiv
• 2020
A gradient algorithm to adaptively optimize the choice of the skew symmetric matrix is presented, which involves a non-reversible diffusion algorithm cross coupled with a stochastic gradient algorithm that adapts the skew asymmetric matrix.
• Computer Science
• 2021
This paper abstracts the radar’s cognition masking problem in terms of the spectra (eigenvalues) of the state and observation noise covariance matrices, and embeds the algebraic Riccati equation into an economics-based utility maximization setup.

## References

SHOWING 1-10 OF 32 REFERENCES
Real-Time Reinforcement Learning of Constrained Markov Decision Processes with Weak Derivatives.
• Computer Science, Mathematics
• 2018
On-line policy gradient algorithms for computing the locally optimal policy of a constrained, average cost, finite state Markov Decision Process, and a novel simulation based gradient estimation scheme involving weak derivatives (measure-valued differentiation).
Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics
• Computer Science
J. Mach. Learn. Res.
• 2016
This article proves that, under verifiable assumptions, the SGLD algorithm is consistent, satisfies a central limit theorem (CLT) and its asymptotic bias-variance decomposition can be characterized by an explicit functional of the step-sizes sequence (δm)m≥0.
Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis
• Computer Science, Mathematics
COLT
• 2017
The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.
Gradient Based Policy Optimization of Constrained Markov Decision Processes
• Mathematics, Computer Science
• 2011
This paper presents a stochastic version of a primal dual (augmented) Lagrange multiplier method for the constrained algorithm and gives an explicit expression of the asymptotic bias and discusses several possibilities for bias reduction.
Bayesian Learning via Stochastic Gradient Langevin Dynamics
• Computer Science
ICML
• 2011
In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic
${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control
• Computer Science
IEEE Transactions on Signal Processing
• 2007
Novel Q-learning based stochastic control algorithms for rate and power control in V-BLAST transmission systems are presented and it is shown that this algorithm converges to the optimal solution as long as the power cost estimates are asymptotically unbiased.
Regime Switching Stochastic Approximation Algorithms with Application to Adaptive Discrete Stochastic Optimization
• Mathematics, Computer Science
SIAM J. Optim.
• 2004
By a combined use of the SA method and two-time-scale Markov chains, asymptotic properties of the algorithm are obtained, which are distinct from the usual SA techniques.
Asynchronous Stochastic Approximation Algorithms for Networked Systems: Regime-Switching Topologies and Multiscale Structure
• Computer Science
Multiscale Model. Simul.
• 2013
This work develops asynchronous stochastic approximation algorithms for networked systems with multiagents and regime-switching topologies to achieve consensus control in an asynchronous fashion without using a global clock.
Passive stochastic approximation with constant step size and window width
• Computer Science
IEEE Trans. Autom. Control.
• 1996
Recursion algorithms combining stochastic approximation and kernel estimation are studied and developed and it is proven that a weak convergence result holds for an interpolated sequence of the iterates.
Langevin-Type Models I: Diffusions with Given Stationary Distributions and their Discretizations*
• Computer Science, Mathematics
• 1999
We describe algorithms for estimating a given measure π known up to a constant of proportionality, based on a large class of diffusions (extending the Langevin model) for which π is invariant. We