• Corpus ID: 15238391

Playing Atari with Deep Reinforcement Learning

@article{Mnih2013PlayingAW,
  title={Playing Atari with Deep Reinforcement Learning},
  author={Volodymyr Mnih and Koray Kavukcuoglu and David Silver and Alex Graves and Ioannis Antonoglou and Daan Wierstra and Martin A. Riedmiller},
  journal={ArXiv},
  year={2013},
  volume={abs/1312.5602}
}
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it… 

Figures and Tables from this paper

Distributed Deep Q-Learning

We propose a distributed deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is based on the deep

Deep Reinforcement Learning With Macro-Actions

This paper focuses on macro-actions, and evaluates these on different Atari 2600 games, where they yield significant improvements in learning speed and can even achieve better scores than DQN.

Learning to play SLITHER.IO with deep reinforcement learning

This project uses deep reinforcement learning to train an agent to play the massively multiplayer online game SLITHER.IO, incorporating human demonstrations, reward shaping and prioritized replay in order to improve stability and successfully learn a policy.

Chrome Dino Run using Reinforcement Learning

This paper has used two of the popular temporal difference approaches namely Deep Q-Learning, and Expected SARSA and also implemented Double DQN model to train the agent and compared the scores with respect to the episodes and convergence of algorithms withrespect to timesteps.

Deep Reinforcement Learning with Regularized Convolutional Neural Fitted Q Iteration

A novel variation which is called Regularized Convolutional Neural Fitted Q Iteration (RCNFQ) that incorporates convolutional neural networks similarly to the Deep Q Network algorithm and dropout regularization to improve generalization performance is introduced.

Transferring Deep Reinforcement Learning with Adversarial Objective and Augmentation

This approach enables the agents to generalize knowledge from a single source task, and boost the learning progress with a semisupervised learning method when facing a new task.

Deep Q-learning using redundant outputs in visual doom

This paper proposes to use redundant outputs in order to adapt training progress in deep reinforcement learning, and compares its method with general ε-greedy in ViZDoom platform.

Deep Reinforcement Learning for Flappy Bird

This project shows that deep reinforcement learning is very effective at learning how to play the game Flappy Bird, despite the high-dimensional sensory input.

Reinforcement Learning and Video Games

Batch normalization is a method to solve internal covariate shift problems in deep neural network and positive influence of this on reinforcement learning has been proved in this study.

Deep reinforcement learning boosted by external knowledge

A new architecture to combine external knowledge and deep reinforcement learning using only visual input is presented, augmenting image input by adding environment feature information and combining two sources of decision.
...

References

SHOWING 1-10 OF 32 REFERENCES

Deep auto-encoder neural networks in reinforcement learning

A framework for combining the training of deep auto-encoders (for learning compact feature spaces) with recently-proposed batch-mode RL algorithms ( for learning policies) is proposed and an emphasis is put on the data-efficiency and on studying the properties of the feature spaces automatically constructed by the deep Auto-encoder neural networks.

Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron, is introduced and it is shown empirically, that reasonably few interactions with the plant are needed to generate control policies of high quality.

Actor-Critic Reinforcement Learning with Energy-Based Policies

This work introduces the first sound and e"cient algorithm for training energy-based policies, based on an actorcritic architecture, that is computationally e-cient, converges close to a local optimum, and outperforms Sallans and Hinton (2004) in several high dimensional domains.

Reinforcement learning for robots using neural networks

This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems.

Learning multiple layers of representation

Reinforcement Learning with Factored States and Actions

A novel approximation method is presented for approximating the value function and selecting good actions for Markov decision processes with large state and action spaces and shows that the product of experts approximation can be used to solve large problems.

Bayesian Learning of Recursively Factored Environments

This paper introduces the class of recursively decomposable factorizations, and shows how exact Bayesian inference can be used to efficiently guarantee predictive performance close to the best factorization in this class.

Reinforcement Learning: An Introduction

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

A Neuroevolution Approach to General Atari Game Playing

Results suggest that neuroevolution is a promising approach to general video game playing (GVGP) and achieved state-of-the-art results, even surpassing human high scores on three games.

Why did TD-Gammon Work?

This work develops a competitive evaluation function on a 4000 parameter feed-forward neural network, without using back-propagation, reinforcement or temporal difference learning methods, and applies simple hill-climbing in a relative fitness environment.