• Corpus ID: 202539277

A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots

@inproceedings{Lynnerup2019ASO,
  title={A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots},
  author={Nicolai A. Lynnerup and Laura Nolling and Rasmus Hasle and John Hallam},
  booktitle={Conference on Robot Learning},
  year={2019}
}
As reinforcement learning (RL) achieves more success in solving complex tasks, more care is needed to ensure that RL research is reproducible and that algorithms herein can be compared easily and fairly with minimal bias. RL results are, however, notoriously hard to reproduce due to the algorithms' intrinsic variance, the environments' stochasticity, and numerous (potentially unreported) hyper-parameters. In this work we investigate the many issues leading to irreproducible research and how to… 

Deep Reinforcement Learning at the Edge of the Statistical Precipice

This paper argues that reliable evaluation in the few-run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field, and advocates for reporting interval estimates of aggregate performance and proposing performance profiles to account for the variability in results.

Is High Variance Unavoidable in RL? A Case Study in Continuous Control

It is argued that developing low-variance agents is an important goal for the RL community via simple modifications and one cause for these outliers is unstable network parametrization which leads to saturating nonlinearities.

Benchmarking Sim-2-Real Algorithms on Real-World Platforms

Learning from simulation is particularly useful, because it is typically cheaper and safer than learning on real-world systems. Nevertheless, the transfer of learned behavior from the simulation to

Towards Augmented Microscopy with Reinforcement Learning-Enhanced Workflows

The results highlight that by taking advantage of RL, microscope operations can be automated without the need for extensive algorithm design, taking another step toward augmenting electron microscopy with machine learning methods.

I S H IGH V ARIANCE U NAVOIDABLE IN RL? A C ASE S TUDY IN C ONTINUOUS C ONTROL

This paper focuses on a popular continuous control setup with high variance – continuous control from pixels with an actor-critic agent, and proposed several methods to decrease variance and argues that developing low-variance agents is an important goal for the RL community.

Improved Path Planning for Indoor Patrol Robot Based on Deep Reinforcement Learning

An improved deep reinforcement learning algorithm based on Pan/Tilt/Zoom (PTZ) image information is proposed to solve the problems of poor exploration ability and convergence speed of traditional deep reinforcement learn in the navigation task of the patrol robot under indoor specified routes.

Delta Hedging of Derivatives using Deep Reinforcement Learning

The results indicate that the hedging strategies based on Reinforcement Learning outperform the benchmark strategies and are suitable for traders taking real-life hedging decisions, even when the networks are trained on synthetic (but versatile) data.

References

SHOWING 1-10 OF 37 REFERENCES

RE-EVALUATE: Reproducibility in Evaluating Reinforcement Learning Algorithms

This work highlights key differences in evaluation in RL compared to supervised learning, and proposes an evaluation pipeline that can be decoupled from the algorithm code, and hopes such an evaluation Pipeline can be standardized, as a step towards robust and reproducible research in RL.

Deep Reinforcement Learning that Matters

Challenges posed by reproducibility, proper experimental techniques, and reporting procedures are investigated and guidelines to make future results in deep RL more reproducible are suggested.

Benchmarking Reinforcement Learning Algorithms on Real-World Robots

This work introduces several reinforcement learning tasks with multiple commercially available robots that present varying levels of learning difficulty, setup, and repeatability and test the learning performance of off-the-shelf implementations of four reinforcement learning algorithms and analyzes sensitivity to their hyper-parameters to determine their readiness for applications in various real-world tasks.

Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control

The significance of hyper-parameters in policy gradients for continuous control, general variance in the algorithms, and reproducibility of reported results are investigated and the guidelines on reporting novel results as comparisons against baseline methods are provided.

Setting up a Reinforcement Learning Task with a Real-World Robot

It is found that learning performance can be highly sensitive to the setup, and thus oversights and omissions in setup details can make effective learning, reproducibility, and fair comparison hard.

Benchmarking Deep Reinforcement Learning for Continuous Control

This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure.

A Study on Overfitting in Deep Reinforcement Learning

This paper conducts a systematic study of standard RL agents and finds that they could overfit in various ways and calls for more principled and careful evaluation protocols in RL.

Continuous control with deep reinforcement learning

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Reinforcement Learning with Deep Energy-Based Policies

A method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before, is proposed and a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution is applied.

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective