• Corpus ID: 10405371

Predictive State Temporal Difference Learning

  title={Predictive State Temporal Difference Learning},
  author={Byron Boots and Geoffrey J. Gordon},
We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By… 

Figures from this paper

Spectral Approaches to Learning Predictive Representations

A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must obtain an accurate environment model, and then

Incremental Basis Construction from Temporal Difference Error

This result suggests a novel method for improving value-function estimation: a primary reinforcement learner estimates its value function using its present basis functions; it then sends its TD error to a secondary learner, which interprets that error as a reward function and estimates the corresponding value function.

Practical Learning of Predictive State Representations

Inference Gradients, a simple, fast, and robust method for practical learning of PSRs, which combines spectral algorithms for PSRs as a consistent and efficient initialization with PSIM-style updates to refine the resulting model parameters.

An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems

A new online spectral algorithm is proposed, which uses tricks such as incremental Singular Value Decomposition (SVD) and random projections to scale to much larger data sets and more complex systems than previous methods.

Efficient learning and planning with compressed predictive states

The notion of compressed PSRs (CPSRs) is introduced, and it is shown how this approach provides a principled avenue for learning accurate approximations of PSRs, drastically reducing the computational costs associated with learning while also providing effective regularization.

Efficient Methods for Prediction and Control in Partially Observable Environments

The proposed framework for constructing state estimators enjoys a number of theoretical and practical advantages over existing methods, and it is demonstrated its efficacy in a prediction setting, where the task is to predict future observations, as well as a control setting, which is to optimize a control policy via reinforcement learning.

Thesis Proposal: Efficient and Tractable Methods for System Identification through Supervised Learning

This work develops a class of dynamical systems and an associated learning meta-algorithm resulting in a framework for system identification that enjoys several theoretical and practical advantages and results in an efficient and local minima-free method for learning non-linear partially observable continuous systems.

Learning Dynamic Policies from Demonstration

It is shown that system identification algorithms with desirable properties like the ability to model long-range dependancies, statistical consistency, and efficient off-the-shelf implementations can be carried over to the learning from demonstration domain.

On the generation of representations for reinforcement learning

It is proved that under certain technical conditions, the size of the dictionary will always grow sub-linearly in the number of data points, and, as a consequence, the kernel linear regressor or value function estimator constructed from the resulting dictionary is consistent.

Learning to Filter with Predictive State Inference Machines

This work presents the PREDICTIVE STATE INFERENCE MACHINE (PSIM), a data-driven method that considers the inference procedure on a dynamical system as a composition of predictors and directly learns predictors for inference in predictive state space.



Least-Squares Policy Iteration

The new algorithm, least-squares policy iteration (LSPI), learns the state-action value function which allows for action selection without a model and for incremental policy improvement within a policy-iteration framework.

Least-Squares Temporal Difference Learning

This paper presents a simpler derivation of the LSTD algorithm, which generalizes from = 0 to arbitrary values of ; at the extreme of = 1, the resulting algorithm is shown to be a practical formulation of supervised linear regression.

Learning predictive state representations using non-blind policies

This work presents two methods for fixing this limitation in most of the existing PSR algorithms: one when the policy is known and one when it is not, and presents an efficient optimization for computing good exploration policies to be used when learning a PSR.

Closing the learning-planning loop with predictive state representations

A novel algorithm is proposed which provably learns a compact, accurate model directly from sequences of action-observation pairs, and is evaluated in a simulated, vision-based mobile robot planning task, showing that the learned PSR captures the essential features of the environment and enables successful and efficient planning.

Predictive Representations of State

This is the first specific formulation of the predictive idea that includes both stochasticity and actions (controls) and it is shown that any system has a linear predictive state representation with number of predictions no greater than the number of states in its minimal POMDP model.

Linear Least-Squares algorithms for temporal difference learning

Two new temporal diffence algorithms based on the theory of linear least-squares function approximation, LS TD and RLS TD, are introduced and prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters.

Learning low dimensional predictive representations

This work provides an efficient principal-components-based algorithm for learning a transformed predictive state representations (TPSRs), and shows that TPSRs can perform well in comparison to Hidden Markov Models learned with Baum-Welch in a real world robot tracking task for low dimensional representations and long prediction horizons.

Improving Approximate Value Iteration Using Memories and Predictive State Representations

This paper shows how to apply point-based techniques to new models for non-Markovian dynamical systems called Predictive State Representatiolls (PSRs) and Memory-PSRs (mPSRs).

Representation Policy Iteration

A novel theoretically rigorous framework is proposed that automatically generates geometrically customized orthonormal sets of basis functions, which can be used with any approximate MDP solver like least-squares policy iteration (LSPI).

Predictive State Representations: A New Theory for Modeling Dynamical Systems

This work introduces an interesting construct, the system-dynamics matrix, and shows how PSRs can be derived simply from it, and uses this construct to show formally that PSRs are more general than both nth-order Markov models and HMMs/POMDPs.