Reinforcement Learning in Continuous State and Action Spaces
@inproceedings{Hasselt2012ReinforcementLI, title={Reinforcement Learning in Continuous State and Action Spaces}, author={Hado Philip van Hasselt}, booktitle={Reinforcement Learning}, year={2012} }
Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. [] Key Method We show how to apply these methods to reinforcement-learning problems and discuss many specific algorithms. Amongst others, we cover gradient-based temporal-difference learning, evolutionary strategies, policy-gradient algorithms and (natural) actor-critic methods. We discuss the advantages of different approaches and compare the performance of a state-of-the-art actor…
114 Citations
Reinforcement learning in continuous state- and action-space
- Computer Science
- 2014
This thesis investigates methods to select the optimal action when artificial neural networks are used to approximate the value function, through the application of numerical optimization techniques and proposes two novel algorithms which are based on the applications of two alternative action selection methods.
Qualitative Transfer for Reinforcement Learning with Continuous State and Action Spaces
- Computer ScienceCIARP
- 2013
A novel approach to transfer knowledge between reinforcement learning tasks with continuous states and actions, where the transition and policy functions are approximated by Gaussian Processes GPs, by using the GPs' hyper-parameters to represent the state transition function in the source task.
Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation
- Computer ScienceFrontiers of Computer Science
- 2017
A novel Dyna variant, called Dyna-LSTD-PA, aiming to handle problems with continuous action spaces, which outperforms two representative methods in terms of convergence rate, success rate, and stability performance on four benchmark RL problems.
A scalable species-based genetic algorithm for reinforcement learning problems
- Computer ScienceThe Knowledge Engineering Review
- 2022
A novel variant of genetic algorithm called SP-GA is proposed which utilizes a species-inspired weight initialization strategy and trains a population of deep neural networks, each estimating the Q-function for the RL problem.
Directed Exploration in Black-Box Optimization for Multi-Objective Reinforcement Learning
- Computer ScienceInt. J. Inf. Technol. Decis. Mak.
- 2019
Usually, real-world problems involve the optimization of multiple, possibly conflicting, objectives. These problems may be addressed by Multi-objective Reinforcement learning (MORL) techniques. MORL…
Off-Policy Neural Fitted Actor-Critic
- Computer ScienceNIPS 2016
- 2016
A new off-policy, offline, model-free, actor-critic reinforcement learning algorithm dealing with continuous environments in both states and actions is presented, which allows to trade-off between data-efficiency and scalability.
Transfer Learning for continuous State and Action Spaces
- Computer ScienceInt. J. Pattern Recognit. Artif. Intell.
- 2014
This work presents a novel approach to transfer knowledge between tasks in a reinforcement learning (RL) framework with continuous states and actions, where the transition and policy functions are approximated by Gaussian processes.
Incremental reinforcement learning for multi-objective robotic tasks
- Computer ScienceKnowledge and Information Systems
- 2016
A different approach to learn problems with more than one objective, which consists of a cyclical process of small perturbations and stabilizations and which tries to avoid degrading the performance of the system while it searches for a new valid policy but that also optimizes a sub-objective.
Continuous action reinforcement learning for control-affine systems with unknown dynamics
- Computer ScienceIEEE/CAA Journal of Automatica Sinica
- 2014
This article proposes sampling-based task learning for control-affine nonlinear systems through the combined learning of both state and action-value functions in a model-free approximate value iteration setting with continuous inputs.
The Challenges of Reinforcement Learning in Robotics and Optimal Control
- Computer ScienceAISI
- 2016
This paper discusses a widely used RL algorithm called Q-learning, which can adapted to work in continuous states and action spaces, the methods for computing rewards which generates an adaptive optimal controller and accelerate learning process and finally the safe exploration approaches.
199 References
Binary action search for learning continuous-action control policies
- Computer ScienceICML '09
- 2009
Binary Action Search eliminates the restrictive modification steps of Adaptive Action Modification and requires no temporal action locality in the domain and can be combined with any discrete-action reinforcement learning algorithm for learning continuous-action policies.
Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces
- Computer ScienceAdapt. Behav.
- 1997
This article proposes a simple and modular technique that can be used to implement function approximators with nonuniform degrees of resolution so that the value function can be represented with higher accuracy in important regions of the state and action spaces.
Using continuous action spaces to solve discrete problems
- Computer Science2009 International Joint Conference on Neural Networks
- 2009
This work shows that Cacla retains much better performance when the action space is changed by removing some actions after some time of learning, even though its continuous actions get rounded to actions in the same finite action space that may contain only a small number of actions.
Reinforcement Learning in Continuous Action Spaces
- Computer Science2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
- 2007
This work presents a new class of algorithms named continuous actor critic learning automaton (CACLA) that can handle continuous states and actions and shows that CACLA performs much better than the other algorithms, especially when it is combined with a Gaussian exploration method.
Infinite-Horizon Policy-Gradient Estimation
- Computer ScienceJ. Artif. Intell. Res.
- 2001
GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies, is introduced.
Reinforcement learning for robots using neural networks
- Computer Science
- 1992
This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems.
Algorithms for Reinforcement Learning
- Computer ScienceAlgorithms for Reinforcement Learning
- 2010
This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.
Off-Policy Temporal Difference Learning with Function Approximation
- Computer ScienceICML
- 2001
The first algorithm for off-policy temporal-difference learning that is stable with linear function approximation is introduced and it is proved that, given training under any -soft policy, the algorithm converges w.p.1 to a close approximation to the action-value function for an arbitrary target policy.
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
- Computer ScienceNIPS
- 1995
It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.