• Corpus ID: 5176587

Noisy Networks for Exploration

@article{Fortunato2018NoisyNF,
  title={Noisy Networks for Exploration},
  author={Meire Fortunato and Mohammad Gheshlaghi Azar and Bilal Piot and Jacob Menick and Ian Osband and Alex Graves and Vlad Mnih and R{\'e}mi Munos and Demis Hassabis and Olivier Pietquin and Charles Blundell and Shane Legg},
  journal={ArXiv},
  year={2018},
  volume={abs/1706.10295}
}
We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration. [] Key Result We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and $\epsilon$-greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human…

Figures and Tables from this paper

Exploration by Random Network Distillation

An exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed and a method to flexibly combine intrinsic and extrinsic rewards that enables significant progress on several hard exploration Atari games is introduced.

Parameter Space Noise for Exploration

This work demonstrates that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.

VASE: Variational Assorted Surprise Exploration for Reinforcement Learning

A new definition of surprise and its RL implementation named Variational Assorted Surprise Exploration (VASE), which uses a Bayesian neural network as a model of the environment dynamics and is trained using variational inference, alternately updating the accuracy of the agent's model and policy.

m-Stage Epsilon-Greedy Exploration for Reinforcement Learning

A generalization of -greedy, called m-stage -Greedy in which increases within each episode but decreases between episodes is proposed, which ensures that by the time an agent gets to explore the later states within an episode, has not decayed too much to do any meaningful exploration.

Diversity-Driven Exploration Strategy for Deep Reinforcement Learning

By simply adding a distance measure to the loss function, the proposed methodology significantly enhances an agent's exploratory behaviors, and thus preventing the policy from being trapped in local optima and an adaptive scaling method for stabilizing the learning process is proposed.

Langevin DQN

Langevin DQN is developed, a variation of DQn that differs only in perturbing parameter updates with Gaussian noise, and an intuition for why Langevin D QN performs deep exploration is provided.

Efficient Exploration Through Bayesian Deep Q-Networks

Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm, is proposed, which can be trained with fast closed-form updates and its samples can be drawn efficiently through the Gaussian distribution.

Switching Isotropic and Directional Exploration with Parameter Space Noise in Deep Reinforcement Learning

This paper proposes a method that deforms the noise distribution according to the accumulated returns and the noises that have led to the returns, and switches isotropic exploration and directional exploration in parameter space with regard to obtained rewards.

Randomized Prior Functions for Deep Reinforcement Learning

It is shown that this approach is efficient with linear representations, provides simple illustrations of its efficacy with nonlinear representations and scales to large-scale problems far better than previous attempts.

State-Aware Variational Thompson Sampling for Deep Q-Networks

A variational Thompson sampling approximation for DQNs which uses a deep network whose parameters are perturbed by a learned variational noise distribution, and hypothesize that such state-aware noisy exploration is particularly useful in problems where exploration in certain high risk states may result in the agent failing badly.
...

References

SHOWING 1-10 OF 47 REFERENCES

Parameter Space Noise for Exploration

This work demonstrates that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.

VIME: Variational Information Maximizing Exploration

VIME is introduced, an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics which efficiently handles continuous state and action spaces and can be applied with several different underlying RL algorithms.

Continuous control with deep reinforcement learning

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

A new algorithm is presented that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems, and shows that spiking the replay buffer with experiences from just a few successful episodes can make Q- learning feasible when it might otherwise fail.

Dueling Network Architectures for Deep Reinforcement Learning

This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.

Asynchronous Methods for Deep Reinforcement Learning

A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

Deep Exploration via Bootstrapped DQN

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and

Generalization and Exploration via Randomized Value Functions

The results suggest that randomized value functions offer a promising approach to tackling a critical challenge in reinforcement learning: synthesizing efficient exploration and effective generalization.

Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks \& Replay Buffer Spiking

This work introduces an exploration technique based on Thompson sampling, drawing Monte Carlo samples from a Bayes-by-backprop neural network, demonstrating marked improvement over common approaches such as $\epsilon$-greedy and Boltzmann exploration.

Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking

This work introduces an exploration technique based on Thompson sampling, drawing Monte Carlo samples from a Bayes-by-backprop neural network, demonstrating marked improvement over common approaches such as -greedy and Boltzmann exploration.