Corpus ID: 231861555

Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State

  title={Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State},
  author={Shi Dong and Benjamin Van Roy and Zhengyuan Zhou},
We design a simple reinforcement learning (RL) agent that implements an optimistic version of Q-learning and establish through regret analysis that this agent can operate with some level of competence in any environment. While we leverage concepts from the literature on provably efficient RL, we consider a general agent-environment interface and provide a novel agent design and analysis. This level of generality positions our results to inform the design of future agents for operation in… Expand

Figures from this paper

Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods
This paper focuses on two vanilla policy gradient variants: the first being a widely used variant with discounted advantage estimations (DAE), the second with an additional fictitious discount factor in the score functions of the policy gradient estimators, and non-asymptotic convergence guarantees are established for both algorithms. Expand


Q-learning for history-based reinforcement learning
This work extends the Q-learning algorithm from the Markov Decision Process setting to problems where observations are non-Markov and do not reveal the full state of the world i.e. to POMDPs and provides a better combination of computational, memory and data eciency than existing algorithms in this setting. Expand
Is Q-learning Provably Efficient?
Model-free reinforcement learning (RL) algorithms, such as Q-learning, directly parameterize and update value functions or policies without explicitly modeling the environment. They are typicallyExpand
Provably Efficient Reinforcement Learning with Aggregated States
This work establishes that an optimistic variant of Q-learning applied to a fixed-horizon episodic Markov decision process with an aggregated state representation incurs regret, the first such result that applies to reinforcement learning with nontrivial value function approximation without any restrictions on transition probabilities. Expand
Near-Optimal Reinforcement Learning in Polynomial Time
New algorithms for reinforcement learning are presented and it is proved that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes. Expand
Feature Reinforcement Learning: State of the Art
This paper examines the progress since Feature RL's inception, which now has both model-based and model-free cost functions, most recently extended to the function approximation setting. Expand
Provably efficient RL with Rich Observations via Latent State Decoding
This work demonstrates how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps inductively and uses it to construct good exploration policies. Expand
Discrete Dynamic Programming with a Small Interest Rate
Abstract : In a fundamental paper on stationary finite state and action Markovian decision processes, Blackwell defines an optimal policy to be one that maximizes the expected total discountedExpand
Performance Loss Bounds for Approximate Value Iteration with State Aggregation
We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by aExpand
A Cost-Shaping Linear Program for Average-Cost Approximate Dynamic Programming with Performance Guarantees
A bound is established on the performance of the resulting policy that scales gracefully with the number of states without imposing the strong Lyapunov condition required by its counterpart in de Farias and Van Roy. Expand
Discrete Dynamic Programming
Converting an optimization problem into a discrete network of nodes and links is the main challenge in using dynamic programming and solving it once a network is constructed is relatively easy. Expand