Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms

  title={Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms},
  author={Fan Chen and Yu Bai and Song Mei},

Tables from this paper



FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs

Under the assumption that the underlying (unknown) dynamics correspond to a low rank transition matrix, it is shown how the representation learning question is related to a particular non-linear matrix decomposition problem.

Contextual Decision Processes with low Bellman rank are PAC-Learnable

A complexity measure, the Bellman rank, is presented that enables tractable learning of near-optimal behavior in CDPs and is naturally small for many well-studied RL models and provides new insights into efficient exploration for RL with function approximation.

Provably efficient RL with Rich Observations via Latent State Decoding

This work demonstrates how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps inductively and uses it to construct good exploration policies.

Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

This work studies reinforcement learning with function approximation for large-scale Partially Observable Markov Decision Processes (POMDPs) where the state space and observation space are large or even continuous and provides a computationally and statistically efficient algorithm for determining the exact optimal policy.

Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency

This work proposes a reinforcement learning algorithm named ETC, which learns the representation at two levels while optimizing the policy in POMDPs with infinite observation and state spaces, and proposes a framework that allows a variety of estimators (including maximum likelihood estimators and generative adversarial networks) to be integrated.

Reinforcement Learning of POMDPs using Spectral Methods

This work proposes a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods and proves an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling withrespect to the dimensionality of observation and action spaces.

Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

This work proposes an actor-critic style algorithm that is capable of performing agnostic policy learning and is even capable of competing against the globally optimal policy without paying an exponential dependence on the horizon in its sample complexity.

Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms

A new complexity measure—Bellman Eluder (BE) dimension is introduced and it is proved that both algorithms learn the near-optimal policies of low BE dimension problems in a number of samples that is polynomial in all relevant parameters, but independent of the size of state-action space.

Approximate Planning in Large POMDPs via Reusable Trajectories

Upper bounds on the sample complexity are proved showing that, even for infinitely large and arbitrarily complex POMDPs, the amount of data needed can be finite, and depends only linearly on the complexity of the restricted strategy class II, and exponentially on the horizon time.

Efficient learning and planning with compressed predictive states

The notion of compressed PSRs (CPSRs) is introduced, and it is shown how this approach provides a principled avenue for learning accurate approximations of PSRs, drastically reducing the computational costs associated with learning while also providing effective regularization.