• Corpus ID: 246608284

Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning

  title={Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning},
  author={Vincent Liu and James Wright and Martha White},
Offline reinforcement learning—learning a policy from a batch of data—is known to be hard for general MDPs. In this work, we explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning. The key property, which we call Action Impact Regularity (AIR), is that actions primarily impact a part of the state (an endogenous component) with limited impact on the remaining part of the state (an exogenous component). We propose an algorithm that exploits the AIR property… 

Figures from this paper


Off-Policy Deep Reinforcement Learning without Exploration
This paper introduces a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data.
Discovering and Removing Exogenous State Variables and Rewards for Reinforcement Learning
Exogenous state variables and rewards are formalized and conditions under which an MDP with exogenous state can be decomposed into an exogenous Markov Reward Process involving only the exogenousState+reward and an endogenous Markov Decision Process defined with respect to only the endogenous rewards are identified.
Provably Good Batch Reinforcement Learning Without Great Exploration
It is shown that a small modification to Bellman optimality and evaluation back-up to take a more conservative update can have much stronger guarantees on the performance of the output policy, and in certain settings, they can find the approximately best policy within the state-action space explored by the batch data, without requiring a priori assumptions of concentrability.
Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning
This paper designs hyperparameter-free algorithms for policy selection based on BVFT, a recent theoretical advance in value-function selection, and demonstrates their effectiveness in discrete-action benchmarks such as Atari.
Behavior Regularized Offline Reinforcement Learning
A general framework, behavior regularized actor critic (BRAC), is introduced to empirically evaluate recently proposed methods as well as a number of simple baselines across a variety of offline continuous control tasks.
Near-Optimal Reinforcement Learning in Polynomial Time
New algorithms for reinforcement learning are presented and it is proved that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes.
High Confidence Policy Improvement
We present a batch reinforcement learning (RL) algorithm that provides probabilistic guarantees about the quality of each policy that it proposes, and which has no hyper-parameters that require
Title of Thesis: Reinforcement Learning in Environments with Independent Delayed-sense Dynamics Reinforcement Learning in Environments with Independent Delayed-sense Dynamics
This thesis develops four reinforcement learning algorithms that exploit the structure of IDSD problems to achieve better efficiency and shows experimentally that their algorithms evaluate a given policy more accurately than the corresponding TD(0).
On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data
We study the fundamental question of the sample complexity of learning a good policy in nite Markov decision processes (MDPs) when the data available for learning is obtained by following a logging
On the sample complexity of reinforcement learning.
Novel algorithms with more restricted guarantees are suggested whose sample complexities are again independent of the size of the state space and depend linearly on the complexity of the policy class, but have only a polynomial dependence on the horizon time.