• Corpus ID: 237353084

# Deep Reinforcement Learning at the Edge of the Statistical Precipice

@inproceedings{Agarwal2021DeepRL,
title={Deep Reinforcement Learning at the Edge of the Statistical Precipice},
author={Rishabh Agarwal and Max Schwarzer and Pablo Samuel Castro and Aaron C. Courville and Marc G. Bellemare},
booktitle={Neural Information Processing Systems},
year={2021}
}
• Published in
Neural Information Processing…
30 August 2021
• Computer Science
Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the…
145 Citations

### The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning

• Computer Science
• 2022
This article augments DRL evaluations to consider parameterized families of MDPs, and shows that in comparison to evaluating DRL methods on select MDP instances, evaluating the MDP family often yields a substantially different relative ranking of methods, casting doubt on what methods should be considered state-of-the-art.

### Mildly Conservative Q-Learning for Offline Reinforcement Learning

• Computer Science
ArXiv
• 2022
This paper proposes Mildly Conservative Q -learning (MCQ), where OOD actions are actively trained by assigning them proper pseudo Q values and theoretically shows that MCQ induces a policy that behaves at least as well as the behavior policy and no erroneous overestimation will occur for OODactions.

### Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

• Computer Science
ArXiv
• 2021
A general method that uses the most recent high variance but unbiased on-policy rollouts to alleviate the bias of the low variance temporal difference targets and sets a new state of the art on the OpenAI gym continuous control benchmark among all algorithms that do not tune hyperparameters for each environment.

### An Empirical Study of Implicit Regularization in Deep Offline RL

• Computer Science
ArXiv
• 2022
It is observed that a direct association exists only in restricted settings and disappears in the more extensive hyperparameter sweeps and found that bootstrapping alone is insuﬃcient to explain the collapse of the eﬀective rank.

### The Primacy Bias in Deep Reinforcement Learning

• Computer Science
ICML
• 2022
This work proposes a simple yet generally-applicable mechanism that tackles the primacy bias of deep reinforcement learning algorithms by periodically resetting a part of the agent.

### Pretraining in Deep Reinforcement Learning: A Survey

• Computer Science
• 2022
This survey seeks to systematically review existing works in pretraining for deep reinforcement learning, provide a taxonomy of these methods, discuss each sub-ﬁeld, and bring attention to open problems and future directions.

### SFP: State-free Priors for Exploration in Off-Policy Reinforcement Learning

• Computer Science
• 2022
This work introduces state-free priors, which directly model temporal consistency in demonstrated trajectories, and are capable of driving exploration in complex tasks, even when trained on data collected on simpler tasks, and introduces a novel integration scheme for action priors in oﬀ-policy reinforcement learning by dynamically sampling actions from a probabilistic mixture of policy and action prior.

### Reward Reports for Reinforcement Learning

• Computer Science
ArXiv
• 2022
Taking inspiration from various contributions to the technical literature on reinforcement learning, Reward Reports are outlined as living documents that track updates to design choices and assumptions behind what a particular automated system is optimizing for.

### Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization

• Computer Science
ArXiv
• 2022
This work highlights vital details that one must get right when adding recurrence to achieve a correct andcient implementation of Proximal Policy Optimization, namely: properly shaping the neural net’s forward pass, arranging the training data, correspondingly selecting hidden states for sequence beginnings and masks for loss computation.

### Efficient Offline Policy Optimization with a Learned Model

• Computer Science
ArXiv
• 2022
This paper uses a regularized onestep look-ahead approach to use the learned model to construct an advantage estimation based on a one-step rollout of Monte-Carlo Tree Search with a low compute budget.

## References

SHOWING 1-10 OF 118 REFERENCES

### Deep Reinforcement Learning that Matters

• Computer Science
AAAI
• 2018
Challenges posed by reproducibility, proper experimental techniques, and reporting procedures are investigated and guidelines to make future results in deep RL more reproducible are suggested.

### Q-Value Weighted Regression: Reinforcement Learning with Limited Data

• Computer Science
2022 International Joint Conference on Neural Networks (IJCNN)
• 2022
This work builds upon Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, but has low sample efficiency and struggles with high-dimensional observation spaces.

### A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots

• Computer Science
CoRL
• 2019
A rigorous and standardised evaluation approach is shown for easing the process of documentation, evaluation and fair comparison of different algorithms, where the importance of choosing the right measurement metrics and conducting proper statistics on the results is emphasised, for unbiased reporting of the results.

### SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

• Computer Science
ICML
• 2021
SUNRISE is a simple unified ensemble method, which is compatible with various off-policy RL algorithms and significantly improves the performance of existing off-Policy RL algorithms, such as Soft Actor-Critic and Rainbow DQN, for both continuous and discrete control tasks on both low-dimensional and high-dimensional environments.

### Munchausen Reinforcement Learning

• Computer Science
NeurIPS
• 2020
It is shown that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games, without making use of distributional RL, n-step returns or prioritized replay.

### Distributional Reinforcement Learning with Quantile Regression

• Computer Science
AAAI
• 2018
This paper examines methods of learning the value distribution instead of the value function in reinforcement learning, and presents a novel distributional reinforcement learning algorithm consistent with the theoretical formulation.

### Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments

• Computer Science
ArXiv
• 2019
This work makes the case for reporting post-training agent performance as a distribution, rather than a point estimate, and demonstrates the variability of common agents used in the popular OpenAI Baselines repository.

### Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field

• Computer Science
• 2019
This work introduces SABER, a Standardized Atari BEnchmark for general Reinforcement learning algorithms and uses it to evaluate the current state of the art, Rainbow, and introduces a human world records baseline, and argues that previous claims of expert or superhuman performance of DRL might not be accurate.

### Reinforcement Learning with Unsupervised Auxiliary Tasks

• Computer Science
ICLR
• 2017
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.

### Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding

It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.