• Corpus ID: 53023256

Inverse reinforcement learning for video games

@article{Tucker2018InverseRL,
  title={Inverse reinforcement learning for video games},
  author={Aaron Tucker and Adam Gleave and Stuart J. Russell},
  journal={ArXiv},
  year={2018},
  volume={abs/1810.10593}
}
Deep reinforcement learning achieves superhuman performance in a range of video game environments, but requires that a designer manually specify a reward function. It is often easier to provide demonstrations of a target behavior than to design a reward function describing that behavior. Inverse reinforcement learning (IRL) algorithms can infer a reward from demonstrations in low-dimensional continuous control environments, but there has been little work on applying IRL to high-dimensional… 

Figures from this paper

Demonstration-Efficient Inverse Reinforcement Learning in Procedurally Generated Environments
TLDR
This work proposes a technique based on Adversarial Inverse Reinforcement Learning which can significantly decrease the need for expert demonstrations in PCG games and demonstrates the effectiveness of the technique on two procedural environments, MiniGrid and DeepCrawl.
Measuring the Impact of Memory Replay in Training Pacman Agents using Reinforcement Learning
TLDR
This research presents an analysis of the impact of three different techniques of memory replay in the performance of a Deep Q-Learning model using different levels of difficulty of the Pacman video game and proposes a multi-channel image - a novel way to create input tensors for training the model - inspired by one-hot encoding.
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
TLDR
A novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a setof potentially poor demonstrations.
Expert-Level Atari Imitation Learning from Demonstrations Only
TLDR
HashReward is a novel imitation learning algorithm utilizing the idea of supervised hashing to realize effective training of the discriminator, which is the first pure imitation learning approach to achieve expert comparable performance in Atari game environments with raw pixel inputs.
Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution
TLDR
Align-RUDDER is introduced, which is RUDDER with two major modifications, which replaces RUDder's LSTM model by a profile model that is obtained from multiple sequence alignment of demonstrations, which considerably reduces the delay of rewards, thus speeding up learning.
Learning to Weight Imperfect Demonstrations
TLDR
Theoretical analysis suggests that with the estimated weights the agent can learn a better policy beyond those plain expert demonstrations, and experiments in the Mujoco and Atari environments demonstrate that the proposed algorithm outperforms baseline methods in handling imperfect expert demonstrations.
Efficiently Guiding Imitation Learning Algorithms with Human Gaze
TLDR
This work uses gaze cues from human demonstrators to enhance the performance of state-of-the-art inverse reinforcement learning (IRL) and behavior cloning (BC) algorithms, without adding any additional learnable parameters to those models.
Scoring-Aggregating-Planning: Learning task-agnostic priors from interactions and sparse rewards for zero-shot generalization
TLDR
Scoring-Aggregating-Planning (SAP) is proposed, a framework that can learn task-agnostic semantics and dynamics priors from arbitrary quality interactions under sparse reward and then plan on unseen tasks in zero-shot condition.
ALIGN-RUDDER: LEARNING FROM FEW DEMON-
TLDR
Align-RUDDER inherits the concept of reward redistribution, which speeds up learning by reducing the delay of rewards, and outperforms competitors on complex artificial tasks with delayed reward and few demonstrations.
Predicting Goal-Directed Human Attention Using Inverse Reinforcement Learning
TLDR
The first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search is proposed, modeled the viewer's internal belief states as dynamic contextual belief maps of object locations, and recovered distinctive target-dependent patterns of object prioritization.
...
1
2
3
...

References

SHOWING 1-10 OF 39 REFERENCES
Playing hard exploration games by watching YouTube
TLDR
A two-stage method of one-shot imitation that allows an agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma's Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.
Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
TLDR
It is demonstrated that AIRL is able to recover reward functions that are robust to changes in dynamics, enabling us to learn policies even under significant variation in the environment seen during training.
A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
TLDR
It is shown that certain IRL methods are in fact mathematically equivalent to GANs, and an equivalence between a sample-based algorithm for maximum entropy IRL and a GAN in which the generator's density can be evaluated and is provided as an additional input to the discriminator.
Human-level control through deep reinforcement learning
TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Deep Reinforcement Learning from Human Preferences
TLDR
This work explores goals defined in terms of (non-expert) human preferences between pairs of trajectory segments in order to effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion.
Deep Q-learning From Demonstrations
TLDR
This paper presents an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstrating data and is able to automatically assess the necessary ratio of demonstrationData while learning thanks to a prioritized replay mechanism.
Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
TLDR
This work proposes an imitation learning method based on video prediction with context translation and deep reinforcement learning that enables a variety of interesting applications, including learning robotic skills that involve tool use simply by observing videos of human tool use.
Apprenticeship learning via inverse reinforcement learning
TLDR
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.
Maximum Entropy Deep Inverse Reinforcement Learning
TLDR
It is shown that the Maximum Entropy paradigm for IRL lends itself naturally to the efficient training of deep architectures, and the approach achieves performance commensurate to the state-of-the-art on existing benchmarks while exceeding on an alternative benchmark based on highly varying reward structures.
Model-Free Deep Inverse Reinforcement Learning by Logistic Regression
  • E. Uchibe
  • Computer Science
    Neural Processing Letters
  • 2017
TLDR
This paper proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures as a problem of density ratio estimation, and shows that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under the framework of linearly solvable Markov decision processes.
...
1
2
3
4
...