• Corpus ID: 231855257

Measuring Progress in Deep Reinforcement Learning Sample Efficiency

  title={Measuring Progress in Deep Reinforcement Learning Sample Efficiency},
  author={Florian E. Dorner},
Sampled environment transitions are a critical input to deep reinforcement learning (DRL) algorithms. Current DRL benchmarks often allow for the cheap and easy generation of large amounts of samples such that perceived progress in DRL does not necessarily correspond to improved sample efficiency. As simulating real world processes is often prohibitively hard and collecting real world experience is costly, sample efficiency is an important indicator for economically relevant applications of DRL… 

Figures and Tables from this paper

Balancing Value Underestimation and Overestimationwith Realistic Actor-Critic
This work proposes a novel model-free algorithm, Realistic Actor-Critic (RAC), which aims to solve trade-offs between value underestimation and overestimation by learning a policy family concerning various confidence-bounds of Q-function, and constructs uncertainty punished Q-learning (UPQ), which uses uncertainty from the ensembling of multiple critics to control estimation bias ofQ-function.
Towards Teachable Autonomous Agents
The purpose of this paper is to elucidate the key obstacles standing in the way towards the design of teachable and autonomous agents and focus on autotelic agents, i.e. agents equipped with forms of intrinsic motivations that enable them to represent, self-generate and pursue their own goals.


Benchmarking Deep Reinforcement Learning for Continuous Control
This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure.
Deep Reinforcement Learning that Matters
Challenges posed by reproducibility, proper experimental techniques, and reporting procedures are investigated and guidelines to make future results in deep RL more reproducible are suggested.
Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
The significance of hyper-parameters in policy gradients for continuous control, general variance in the algorithms, and reproducibility of reported results are investigated and the guidelines on reporting novel results as comparisons against baseline methods are provided.
Do recent advancements in model-based deep reinforcement learning really improve data efficiency?
It is demonstrated that the state-of-the-art model-free Rainbow DQN algorithm can be trained using a much smaller number of samples than it is commonly reported, at a fraction of complexity and computational costs.
State of the Art Control of Atari Games Using Shallow Reinforcement Learning
This paper systematically evaluates the importance of key representational biases encoded by DQN's network by proposing simple linear representations that make use of these concepts, and obtains a computationally practical feature set that achieves competitive performance to D QN in the ALE.
Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field
This work introduces SABER, a Standardized Atari BEnchmark for general Reinforcement learning algorithms and uses it to evaluate the current state of the art, Rainbow, and introduces a human world records baseline, and argues that previous claims of expert or superhuman performance of DRL might not be accurate.
Data-Efficient Reinforcement Learning with Momentum Predictive Representations
This work trains an agent to predict its own latent state representations multiple steps into the future using an encoder which is an exponential moving average of the agent's parameters, and makes predictions using a learned transition model.
Deep Reinforcement Learning with Double Q-Learning
This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation.
Information-Directed Exploration for Deep Reinforcement Learning
This work builds on recent advances in distributional reinforcement learning and proposes a novel, tractable approximation of IDS for deep Q-learning and explicitly accounts for both parametric uncertainty and heteroscedastic observation noise.