• Corpus ID: 237353084

Deep Reinforcement Learning at the Edge of the Statistical Precipice

  title={Deep Reinforcement Learning at the Edge of the Statistical Precipice},
  author={Rishabh Agarwal and Max Schwarzer and Pablo Samuel Castro and Aaron C. Courville and Marc G. Bellemare},
  booktitle={Neural Information Processing Systems},
Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the… 

The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning

This article augments DRL evaluations to consider parameterized families of MDPs, and shows that in comparison to evaluating DRL methods on select MDP instances, evaluating the MDP family often yields a substantially different relative ranking of methods, casting doubt on what methods should be considered state-of-the-art.

Mildly Conservative Q-Learning for Offline Reinforcement Learning

This paper proposes Mildly Conservative Q -learning (MCQ), where OOD actions are actively trained by assigning them proper pseudo Q values and theoretically shows that MCQ induces a policy that behaves at least as well as the behavior policy and no erroneous overestimation will occur for OODactions.

An Empirical Study of Implicit Regularization in Deep Offline RL

It is observed that a direct association exists only in restricted settings and disappears in the more extensive hyperparameter sweeps and found that bootstrapping alone is insufficient to explain the collapse of the effective rank.

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

A general method called Adaptively Calibrated Critics (ACC) is proposed that uses the most recent high variance but unbiased on-policy rollouts to alleviate the bias of the low variance temporal difference targets.

Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies

A sequential approach to evaluate offline RL algorithms as a function of the training set size and thus by their data efficiency is proposed, which provides valuable insights into the dataency of the learning process and the robustness of algorithms to distribution changes in the dataset.

The Primacy Bias in Deep Reinforcement Learning

This work proposes a simple yet generally-applicable mechanism that tackles the primacy bias of deep reinforcement learning algorithms by periodically resetting a part of the agent.

Pretraining in Deep Reinforcement Learning: A Survey

This survey seeks to systematically review existing works in pretraining for deep reinforcement learning, provide a taxonomy of these methods, discuss each sub-field, and bring attention to open problems and future directions.

SFP: State-free Priors for Exploration in Off-Policy Reinforcement Learning

This work introduces state-free priors, which directly model temporal consistency in demonstrated trajectories, and are capable of driving exploration in complex tasks, even when trained on data collected on simpler tasks, and introduces a novel integration scheme for action priors in off-policy reinforcement learning by dynamically sampling actions from a probabilistic mixture of policy and action prior.

Democratizing RL Research by Reusing Prior Computation

  • Computer Science
  • 2022
As deep RL research move towards more complex and challenging benchmarks, the computational barrier to entry in RL research would be even substantially higher, due to the inefficiency of tabula rasa RL.

Reward Reports for Reinforcement Learning

Taking inspiration from various contributions to the technical literature on reinforcement learning, Reward Reports are outlined as living documents that track updates to design choices and assumptions behind what a particular automated system is optimizing for.



Deep Reinforcement Learning that Matters

Challenges posed by reproducibility, proper experimental techniques, and reporting procedures are investigated and guidelines to make future results in deep RL more reproducible are suggested.

Q-Value Weighted Regression: Reinforcement Learning with Limited Data

This work builds upon Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, but has low sample efficiency and struggles with high-dimensional observation spaces.

Distributional Reinforcement Learning with Quantile Regression

This paper examines methods of learning the value distribution instead of the value function in reinforcement learning, and presents a novel distributional reinforcement learning algorithm consistent with the theoretical formulation.

A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots

A rigorous and standardised evaluation approach is shown for easing the process of documentation, evaluation and fair comparison of different algorithms, where the importance of choosing the right measurement metrics and conducting proper statistics on the results is emphasised, for unbiased reporting of the results.

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

SUNRISE is a simple unified ensemble method, which is compatible with various off-policy RL algorithms and significantly improves the performance of existing off-Policy RL algorithms, such as Soft Actor-Critic and Rainbow DQN, for both continuous and discrete control tasks on both low-dimensional and high-dimensional environments.

Munchausen Reinforcement Learning

It is shown that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games, without making use of distributional RL, n-step returns or prioritized replay.

Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments

This work makes the case for reporting post-training agent performance as a distribution, rather than a point estimate, and demonstrates the variability of common agents used in the popular OpenAI Baselines repository.

Reinforcement Learning with Unsupervised Auxiliary Tasks

This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.

Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding

It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.

SEERL: Sample Efficient Ensemble Reinforcement Learning

It is shown that learning an adequately diverse set of policies is required for a good ensemble while extreme diversity can prove detrimental to overall performance and this framework is seen to outperform state of the art SOTA scores in Atari 2600 and Mujoco.