• Corpus ID: 75134905

Imitation Learning of Factored Multi-agent Reactive Models

  title={Imitation Learning of Factored Multi-agent Reactive Models},
  author={Michael Teng and Tuan Anh Le and A. Scibior and Frank Wood},
We apply recent advances in deep generative modeling to the task of imitation learning from biological agents. Specifically, we apply variations of the variational recurrent neural network model to a multi-agent setting where we learn policies of individual uncoordinated agents acting based on their perceptual inputs and their hidden belief state. We learn stochastic policies for these agents directly from observational data, without constructing a reward function. An inference network learned… 
1 Citations

Figures and Tables from this paper

Quantifying behavior to understand the brain

An overview of the latest advances in motion tracking and behavior prediction is provided and how quantitative descriptions of behavior can be leveraged to connect brain activity with animal movements, with the ultimate goal of resolving the relationship between neural circuits, cognitive processes and behavior.



Unsupervised Perceptual Rewards for Imitation Learning

This work presents a method that is able to identify key intermediate steps of a task from only a handful of demonstration sequences, and automatically identify the most discriminative features for identifying these steps.

Generating Multi-Agent Trajectories using Programmatic Weak Supervision

This work presents a hierarchical framework that can effectively learn sequential generative models for capturing coordinated multi-agent trajectory behavior, such as offensive basketball gameplay, and is inspired by recent work on leveraging programmatically produced weak labels, which it extends to the spatiotemporal regime.

An Algorithmic Perspective on Imitation Learning

This work provides an introduction to imitation learning, dividing imitation learning into directly replicating desired behavior and learning the hidden objectives of the desired behavior from demonstrations (called inverse optimal control or inverse reinforcement learning [Russell, 1998]).

Neural Variational Inference and Learning in Belief Networks

This work proposes a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior and shows that it outperforms the wake-sleep algorithm on MNIST and achieves state-of-the-art results on the Reuters RCV1 document dataset.

Learning recurrent representations for hierarchical behavior modeling

Taking advantage of unlabeled sequences, by predicting future motion, significantly improves action detection performance when training labels are scarce and simulated motion trajectories generated by treating motion prediction as input to the network look realistic and may be used to qualitatively evaluate whether the model has learnt generative control rules.

Reinforcement Learning: An Introduction

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

This work introduces a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables, and gives an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.

Importance Weighted Autoencoders

The importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting, shows empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log- likelihood on density estimation benchmarks.

Generative Temporal Models with Memory

Generative Temporal Models augmented with external memory systems are introduced and it is shown that these models store information from early in a sequence, and reuse this stored information efficiently, which allows them to perform substantially better than existing models based on well-known recurrent neural networks, like LSTMs.

Variational Inference for Monte Carlo Objectives

The first unbiased gradient estimator designed for importance-sampled objectives is developed, which is both simpler and more effective than the NVIL estimator proposed for the single-sample variational objective, and is competitive with the currently used biases.