Non-Markovian Reinforcement Learning using Fractional Dynamics

  title={Non-Markovian Reinforcement Learning using Fractional Dynamics},
  author={Gaurav Gupta and Chenzhong Yin and Jyotirmoy V. Deshmukh and Paul Bogdan},
  journal={2021 60th IEEE Conference on Decision and Control (CDC)},
Reinforcement learning (RL) is a technique to learn the control policy for an agent that interacts with a stochastic environment. In any given state, the agent takes some action, and the environment determines the probability distribution over the next state as well as gives the agent some reward. Most RL algorithms typically assume that the environment satisfies Markov assumptions (i.e. the probability distribution over the next state depends only on the current state). In this paper, we… 
2 Citations

Figures from this paper



Reinforcement Learning with Non-Markovian Rewards

Four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms are described and evaluated empirically to obtain new RL algorithms for domains with NMR to address the problem of policy learning from experience with non-Markovian rewards.

On Q-learning Convergence for Non-Markov Decision Processes

It is proved that the convergence guarantee of Q-learning can be extended to a class of such non-MDP problems, in particular, to some non-stationary domains and it is shown that state-uniformity of the optimal Q-value function is a necessary and sufficient condition for Q- learning to converge even in the case of infinitely many internal states.

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.

The Optimal Control of Partially Observable Markov Processes over a Finite Horizon

If there are only a finite number of control intervals remaining, then the optimal payoff function is a piecewise-linear, convex function of the current state probabilities of the internal Markov process, and an algorithm for utilizing this property to calculate the optimal control policy and payoff function for any finite horizon is outlined.

Nonparametric General Reinforcement Learning

It is proved that Thompson sampling is asymptotically optimal in stochastic environments in the sense that its value converges to the value of the optimal policy, and Thompson sampling achieves sublinear regret in these environments.

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

This work presents a framework for improving on MPC with model-free reinforcement learning (RL), and shows how error from inaccurate models in MPC and value function estimation in RL can be balanced.

Non-Markovian Control with Gated End-to-End Memory Policy Networks

This paper uses a model-free value-based algorithm to learn policies for partially observed domains using this memory-enhanced neural network, the Gated End-to-End Memory Network, for sequential control.

Agnostic System Identification for Model-Based Reinforcement Learning

It is shown that any no-regret online learning algorithm can be used to obtain a near-optimal policy, provided some model achieves low training error and access to a good exploration distribution.

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

It is demonstrated that neural network dynamics models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits that accomplish various complex locomotion tasks.