Reinforcement Learning in Continuous State and Action Spaces

  title={Reinforcement Learning in Continuous State and Action Spaces},
  author={Hado Philip van Hasselt},
  booktitle={Reinforcement Learning},
Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. [] Key Method We show how to apply these methods to reinforcement-learning problems and discuss many specific algorithms. Amongst others, we cover gradient-based temporal-difference learning, evolutionary strategies, policy-gradient algorithms and (natural) actor-critic methods. We discuss the advantages of different approaches and compare the performance of a state-of-the-art actor…

Reinforcement learning in continuous state- and action-space

This thesis investigates methods to select the optimal action when artificial neural networks are used to approximate the value function, through the application of numerical optimization techniques and proposes two novel algorithms which are based on the applications of two alternative action selection methods.

Qualitative Transfer for Reinforcement Learning with Continuous State and Action Spaces

A novel approach to transfer knowledge between reinforcement learning tasks with continuous states and actions, where the transition and policy functions are approximated by Gaussian Processes GPs, by using the GPs' hyper-parameters to represent the state transition function in the source task.

Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation

A novel Dyna variant, called Dyna-LSTD-PA, aiming to handle problems with continuous action spaces, which outperforms two representative methods in terms of convergence rate, success rate, and stability performance on four benchmark RL problems.

A scalable species-based genetic algorithm for reinforcement learning problems

A novel variant of genetic algorithm called SP-GA is proposed which utilizes a species-inspired weight initialization strategy and trains a population of deep neural networks, each estimating the Q-function for the RL problem.

Directed Exploration in Black-Box Optimization for Multi-Objective Reinforcement Learning

Usually, real-world problems involve the optimization of multiple, possibly conflicting, objectives. These problems may be addressed by Multi-objective Reinforcement learning (MORL) techniques. MORL

Off-Policy Neural Fitted Actor-Critic

A new off-policy, offline, model-free, actor-critic reinforcement learning algorithm dealing with continuous environments in both states and actions is presented, which allows to trade-off between data-efficiency and scalability.

Transfer Learning for continuous State and Action Spaces

This work presents a novel approach to transfer knowledge between tasks in a reinforcement learning (RL) framework with continuous states and actions, where the transition and policy functions are approximated by Gaussian processes.

Incremental reinforcement learning for multi-objective robotic tasks

A different approach to learn problems with more than one objective, which consists of a cyclical process of small perturbations and stabilizations and which tries to avoid degrading the performance of the system while it searches for a new valid policy but that also optimizes a sub-objective.

Continuous action reinforcement learning for control-affine systems with unknown dynamics

This article proposes sampling-based task learning for control-affine nonlinear systems through the combined learning of both state and action-value functions in a model-free approximate value iteration setting with continuous inputs.

The Challenges of Reinforcement Learning in Robotics and Optimal Control

This paper discusses a widely used RL algorithm called Q-learning, which can adapted to work in continuous states and action spaces, the methods for computing rewards which generates an adaptive optimal controller and accelerate learning process and finally the safe exploration approaches.

Binary action search for learning continuous-action control policies

Binary Action Search eliminates the restrictive modification steps of Adaptive Action Modification and requires no temporal action locality in the domain and can be combined with any discrete-action reinforcement learning algorithm for learning continuous-action policies.

Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces

This article proposes a simple and modular technique that can be used to implement function approximators with nonuniform degrees of resolution so that the value function can be represented with higher accuracy in important regions of the state and action spaces.

Using continuous action spaces to solve discrete problems

This work shows that Cacla retains much better performance when the action space is changed by removing some actions after some time of learning, even though its continuous actions get rounded to actions in the same finite action space that may contain only a small number of actions.

Natural actor-critic algorithms

Reinforcement Learning in Continuous Action Spaces

  • H. van HasseltM. Wiering
  • Computer Science
    2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
  • 2007
This work presents a new class of algorithms named continuous actor critic learning automaton (CACLA) that can handle continuous states and actions and shows that CACLA performs much better than the other algorithms, especially when it is combined with a Gaussian exploration method.

Infinite-Horizon Policy-Gradient Estimation

GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies, is introduced.

Reinforcement learning for robots using neural networks

This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems.

Algorithms for Reinforcement Learning

This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.

Off-Policy Temporal Difference Learning with Function Approximation

The first algorithm for off-policy temporal-difference learning that is stable with linear function approximation is introduced and it is proved that, given training under any -soft policy, the algorithm converges w.p.1 to a close approximation to the action-value function for an arbitrary target policy.

Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding

It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.