Robert William Wright

Learn More
Approximate value iteration methods for reinforcement learning (RL) generalize experience from limited samples across large state-action spaces. The function approximators used in such methods typically introduce errors in value estimation which can harm the quality of the learned value functions. We present a new batch-mode, off-policy, approximate value(More)
Reinforcement learning (RL) is designed to learn optimal control policies from unsupervised interactions with the environment. Many successful RL algorithms have been developed, however, none of them can efficiently tackle problems with high-dimensional state spaces due to the "curse of dimensionality," and so their applicability to real-world scenarios is(More)
Fitted Q-Iteration (FQI) is a popular approximate value iteration (AVI) approach that makes effective use of off-policy data. FQI uses a 1-step return value update which does not exploit the sequential nature of trajectory data. Complex returns (weighted averages of the n-step returns) use tra-jectory data more effectively, but have not been used in an AVI(More)
Approximate value iteration (AVI) is a widely used technique in reinforcement learning. Most AVI methods do not take full advantage of the sequential relationship between samples within a trajectory in deriving value estimates, due to the challenges in dealing with the inherent bias and variance in the n-step returns. We propose a bounding method which uses(More)
  • 1