Corpus ID: 225072922

How to Make Deep RL Work in Practice

  title={How to Make Deep RL Work in Practice},
  author={Nirnai Rao and Elie Aljalbout and Axel Sauer and Sami Haddadin},
In recent years, challenging control problems became solvable with deep reinforcement learning (RL). To be able to use RL for large-scale real-world applications, a certain degree of reliability in their performance is necessary. Reported results of state-of-the-art algorithms are often difficult to reproduce. One reason for this is that certain implementation details influence the performance significantly. Commonly, these details are not highlighted as important techniques to achieve state-of… Expand
Learning Vision-based Reactive Policies for Obstacle Avoidance
The ability of the proposed method to efficiently learn stable obstacle avoidance strategies at a high success rate, while maintaining closed-loop responsiveness required for critical applications like human-robot interaction is shown. Expand
A few lessons learned in reinforcement learning for quadcopter attitude control
This paper discusses theoretical as well as practical aspects of training neural nets for controlling a crazyflie 2.0 drone, and describes thoroughly the choices in training algorithms, neural net architecture, hyperparameters, observation space etc. Expand
A learning gap between neuroscience and reinforcement learning
A T-maze task from neuroscience is extended for use with reinforcement learning algorithms, and it is shown that state-of-the-art algorithms are not capable of solving this problem. Expand
RLOps: Development Life-cycle of Reinforcement Learning Aided Open RAN
  • Peizheng Li, Jonathan Thomas, +8 authors R. Piechocki
  • Computer Science
  • ArXiv
  • 2021
Radio access network (RAN) technologies continue to witness massive growth, with Open RAN gaining the most recent momentum. In the O-RAN specifications, the RAN intelligent controller (RIC) serves asExpand
Reinforcement Learning with Formal Performance Metrics for Quadcopter Attitude Control under Non-nominal Contexts
A robust form of a signal temporal logic is developed to quantitatively evaluate the vehicle’s behavior and measure the performance of controllers to draw conclusions on practical controller design by reinforcement learning. Expand


Deep Reinforcement Learning that Matters
Challenges posed by reproducibility, proper experimental techniques, and reporting procedures are investigated and guidelines to make future results in deep RL more reproducible are suggested. Expand
Benchmarking Deep Reinforcement Learning for Continuous Control
This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure. Expand
Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
The significance of hyper-parameters in policy gradients for continuous control, general variance in the algorithms, and reproducibility of reported results are investigated and the guidelines on reporting novel results as comparisons against baseline methods are provided. Expand
Continuous control with deep reinforcement learning
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs. Expand
Q-Learning for Continuous Actions with Cross-Entropy Guided Policies
This work proposes a novel approach, called Cross-Entropy Guided Policies, or CGP, that aims to combine the stability and performance of iterative sampling policies with the low computational cost of a policy network. Expand
The Mirage of Action-Dependent Baselines in Reinforcement Learning
The variance decomposition of the policy gradient estimator is decompose and it is numerically shown that learned state-action-dependent baselines do not in fact reduce variance over a state-dependent baseline in commonly tested benchmark domains. Expand
Addressing Function Approximation Error in Actor-Critic Methods
This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias. Expand
RE-EVALUATE: Reproducibility in Evaluating Reinforcement Learning Algorithms
This work highlights key differences in evaluation in RL compared to supervised learning, and proposes an evaluation pipeline that can be decoupled from the algorithm code, and hopes such an evaluation Pipeline can be standardized, as a step towards robust and reproducible research in RL. Expand
Playing Atari with Deep Reinforcement Learning
This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them. Expand
Time Limits in Reinforcement Learning
This paper provides a formal account for how time limits could effectively be handled in each of the two cases and explains why not doing so can cause state-aliasing and invalidation of experience replay, leading to suboptimal policies and training instability. Expand