Deep Reinforcement Learning at the Edge of the Statistical Precipice
@inproceedings{Agarwal2021DeepRL, title={Deep Reinforcement Learning at the Edge of the Statistical Precipice}, author={Rishabh Agarwal and Max Schwarzer and Pablo Samuel Castro and Aaron C. Courville and Marc G. Bellemare}, booktitle={Neural Information Processing Systems}, year={2021} }
Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the…
Figures and Tables from this paper
figure 1 table 1 figure 10 figure 11 figure 12 figure 2 figure 3 figure 4 figure 5 figure 6 figure 7 figure 8 figure 9 figure A.13 figure A.14 figure A.15 figure A.16 figure A.17 figure A.18 figure A.19 figure A.20 figure A.21 figure A.22 figure A.23 figure A.24 figure A.25 figure A.26 figure A.28 figure A.29 figure A.30 figure A.31 figure A.32 figure A.33
158 Citations
The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning
- Computer ScienceArXiv
- 2022
This article augments DRL evaluations to consider parameterized families of MDPs, and shows that in comparison to evaluating DRL methods on select MDP instances, evaluating the MDP family often yields a substantially different relative ranking of methods, casting doubt on what methods should be considered state-of-the-art.
Mildly Conservative Q-Learning for Offline Reinforcement Learning
- Computer ScienceArXiv
- 2022
This paper proposes Mildly Conservative Q -learning (MCQ), where OOD actions are actively trained by assigning them proper pseudo Q values and theoretically shows that MCQ induces a policy that behaves at least as well as the behavior policy and no erroneous overestimation will occur for OODactions.
An Empirical Study of Implicit Regularization in Deep Offline RL
- Computer ScienceArXiv
- 2022
It is observed that a direct association exists only in restricted settings and disappears in the more extensive hyperparameter sweeps and found that bootstrapping alone is insufficient to explain the collapse of the effective rank.
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
- Computer ScienceIEEE Robotics and Automation Letters
- 2023
A general method called Adaptively Calibrated Critics (ACC) is proposed that uses the most recent high variance but unbiased on-policy rollouts to alleviate the bias of the low variance temporal difference targets.
Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies
- Computer ScienceArXiv
- 2022
A sequential approach to evaluate offline RL algorithms as a function of the training set size and thus by their data efficiency is proposed, which provides valuable insights into the dataency of the learning process and the robustness of algorithms to distribution changes in the dataset.
The Primacy Bias in Deep Reinforcement Learning
- Computer ScienceICML
- 2022
This work proposes a simple yet generally-applicable mechanism that tackles the primacy bias of deep reinforcement learning algorithms by periodically resetting a part of the agent.
Pretraining in Deep Reinforcement Learning: A Survey
- Computer ScienceArXiv
- 2022
This survey seeks to systematically review existing works in pretraining for deep reinforcement learning, provide a taxonomy of these methods, discuss each sub-field, and bring attention to open problems and future directions.
SFP: State-free Priors for Exploration in Off-Policy Reinforcement Learning
- Computer Science
- 2022
This work introduces state-free priors, which directly model temporal consistency in demonstrated trajectories, and are capable of driving exploration in complex tasks, even when trained on data collected on simpler tasks, and introduces a novel integration scheme for action priors in off-policy reinforcement learning by dynamically sampling actions from a probabilistic mixture of policy and action prior.
Democratizing RL Research by Reusing Prior Computation
- Computer Science
- 2022
As deep RL research move towards more complex and challenging benchmarks, the computational barrier to entry in RL research would be even substantially higher, due to the inefficiency of tabula rasa RL.
Reward Reports for Reinforcement Learning
- Computer ScienceArXiv
- 2022
Taking inspiration from various contributions to the technical literature on reinforcement learning, Reward Reports are outlined as living documents that track updates to design choices and assumptions behind what a particular automated system is optimizing for.
References
SHOWING 1-10 OF 118 REFERENCES
Deep Reinforcement Learning that Matters
- Computer ScienceAAAI
- 2018
Challenges posed by reproducibility, proper experimental techniques, and reporting procedures are investigated and guidelines to make future results in deep RL more reproducible are suggested.
Q-Value Weighted Regression: Reinforcement Learning with Limited Data
- Computer Science2022 International Joint Conference on Neural Networks (IJCNN)
- 2022
This work builds upon Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, but has low sample efficiency and struggles with high-dimensional observation spaces.
Distributional Reinforcement Learning with Quantile Regression
- Computer ScienceAAAI
- 2018
This paper examines methods of learning the value distribution instead of the value function in reinforcement learning, and presents a novel distributional reinforcement learning algorithm consistent with the theoretical formulation.
A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots
- Computer ScienceCoRL
- 2019
A rigorous and standardised evaluation approach is shown for easing the process of documentation, evaluation and fair comparison of different algorithms, where the importance of choosing the right measurement metrics and conducting proper statistics on the results is emphasised, for unbiased reporting of the results.
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning
- Computer ScienceICML
- 2021
SUNRISE is a simple unified ensemble method, which is compatible with various off-policy RL algorithms and significantly improves the performance of existing off-Policy RL algorithms, such as Soft Actor-Critic and Rainbow DQN, for both continuous and discrete control tasks on both low-dimensional and high-dimensional environments.
Munchausen Reinforcement Learning
- Computer ScienceNeurIPS
- 2020
It is shown that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games, without making use of distributional RL, n-step returns or prioritized replay.
Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments
- Computer ScienceArXiv
- 2019
This work makes the case for reporting post-training agent performance as a distribution, rather than a point estimate, and demonstrates the variability of common agents used in the popular OpenAI Baselines repository.
Reinforcement Learning with Unsupervised Auxiliary Tasks
- Computer ScienceICLR
- 2017
This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
- Computer ScienceNIPS
- 1995
It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.
SEERL: Sample Efficient Ensemble Reinforcement Learning
- Computer ScienceAAMAS
- 2021
It is shown that learning an adequately diverse set of policies is required for a good ensemble while extreme diversity can prove detrimental to overall performance and this framework is seen to outperform state of the art SOTA scores in Atari 2600 and Mujoco.