• Corpus ID: 219966551

Langevin Dynamics for Inverse Reinforcement Learning of Stochastic Gradient Algorithms

@article{Krishnamurthy2020LangevinDF,
  title={Langevin Dynamics for Inverse Reinforcement Learning of Stochastic Gradient Algorithms},
  author={Vikram Krishnamurthy and George Yin},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.11674}
}
Inverse reinforcement learning (IRL) aims to estimate the reward function of optimizing agents by observing their response (estimates or actions). This paper considers IRL when noisy estimates of the gradient of a reward function generated by multiple stochastic gradient agents are observed. We present a generalized Langevin dynamics algorithm to estimate the reward function $R(\theta)$; specifically, the resulting Langevin algorithm asymptotically generates samples from the distribution… 

Figures from this paper

Multikernel Passive Stochastic Gradient Algorithms and Transfer Learning
TLDR
This article develops a novel passive stochastic gradient algorithm that performs substantially better in high dimensional problems and incorporates variance reduction.
Multi-kernel Passive Stochastic Gradient Algorithms
TLDR
This paper develops a novel passive stochastic gradient algorithm that performs substantially better in high dimensional problems and incorporates variance reduction.
Adaptive Non-reversible Stochastic Gradient Langevin Dynamics
TLDR
A gradient algorithm to adaptively optimize the choice of the skew symmetric matrix is presented, which involves a non-reversible diffusion algorithm cross coupled with a stochastic gradient algorithm that adapts the skew asymmetric matrix.
How can a Cognitive Radar Mask its Cognition?
TLDR
This paper abstracts the radar’s cognition masking problem in terms of the spectra (eigenvalues) of the state and observation noise covariance matrices, and embeds the algebraic Riccati equation into an economics-based utility maximization setup.

References

SHOWING 1-10 OF 32 REFERENCES
Real-Time Reinforcement Learning of Constrained Markov Decision Processes with Weak Derivatives.
TLDR
On-line policy gradient algorithms for computing the locally optimal policy of a constrained, average cost, finite state Markov Decision Process, and a novel simulation based gradient estimation scheme involving weak derivatives (measure-valued differentiation).
Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics
TLDR
This article proves that, under verifiable assumptions, the SGLD algorithm is consistent, satisfies a central limit theorem (CLT) and its asymptotic bias-variance decomposition can be characterized by an explicit functional of the step-sizes sequence (δm)m≥0.
Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis
TLDR
The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.
Gradient Based Policy Optimization of Constrained Markov Decision Processes
TLDR
This paper presents a stochastic version of a primal dual (augmented) Lagrange multiplier method for the constrained algorithm and gives an explicit expression of the asymptotic bias and discusses several possibilities for bias reduction.
Bayesian Learning via Stochastic Gradient Langevin Dynamics
In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic
How does a stochastic optimization/approximation algorithm adapt to a randomly evolving optimum/root with jump Markov sample paths
TLDR
This paper analyzes the asymptotic properties of stochastic optimization/approximation algorithms for recursively estimating the optimum or root when it evolves rapidly with nonsmooth (jump-changing) sample paths and proves convergence of the algorithm.
${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control
TLDR
Novel Q-learning based stochastic control algorithms for rate and power control in V-BLAST transmission systems are presented and it is shown that this algorithm converges to the optimal solution as long as the power cost estimates are asymptotically unbiased.
Regime Switching Stochastic Approximation Algorithms with Application to Adaptive Discrete Stochastic Optimization
TLDR
By a combined use of the SA method and two-time-scale Markov chains, asymptotic properties of the algorithm are obtained, which are distinct from the usual SA techniques.
Asynchronous Stochastic Approximation Algorithms for Networked Systems: Regime-Switching Topologies and Multiscale Structure
TLDR
This work develops asynchronous stochastic approximation algorithms for networked systems with multiagents and regime-switching topologies to achieve consensus control in an asynchronous fashion without using a global clock.
Passive stochastic approximation with constant step size and window width
  • G. Yin, K. Yin
  • Computer Science
    IEEE Trans. Autom. Control.
  • 1996
TLDR
Recursion algorithms combining stochastic approximation and kernel estimation are studied and developed and it is proven that a weak convergence result holds for an interpolated sequence of the iterates.
...
...