Corpus ID: 208139527

Working Memory Graphs

@inproceedings{Loynd2020WorkingMG,
  title={Working Memory Graphs},
  author={Ricky Loynd and Roland Fernandez and Asli Çelikyilmaz and Adith Swaminathan and Matthew J. Hausknecht},
  booktitle={ICML},
  year={2020}
}
Transformers have increasingly outperformed gated RNNs in obtaining new state-of-the-art results on supervised tasks involving text sequences. Inspired by this trend, we study the question of how Transformer-based models can improve the performance of sequential decision-making agents. We present the Working Memory Graph (WMG), an agent that employs multi-head self-attention to reason over a dynamic set of vectors representing observed and recurrent state. We evaluate WMG in three environments… Expand
Multi-Task Learning for User Engagement and Adoption in Live Video Streaming Events
TLDR
A multitask deep reinforcement learning model to select the time of a live video streaming event, aiming to optimize the viewer’s engagement and adoption at the same time, is presented. Expand
My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control
TLDR
Motivated by the hypothesis that any benefits GNNs extract from the graph structure are outweighed by difficulties they create for message passing, Amorpheus is proposed, a transformer-based approach that substantially outperforms GNN-based methods. Expand
Snowflake: Scaling GNNs to High-Dimensional Continuous Control via Parameter Freezing
TLDR
SNOWFLAKE is introduced, a GNN training method for high-dimensional continuous control that freezes parameters in parts of the network that suffer from overfitting, and significantly boosts the performance of GNNs for locomotion control on large agents, now matching theperformance of MLPs, and with superior transfer properties. Expand
FACTOREDRL: LEVERAGING FACTORED GRAPHS FOR DEEP REINFORCEMENT LEARNING
We propose a simple class of deep reinforcement learning (RL) methods, called FactoredRL, that can leverage factored environment structures to improve the sample efficiency of existing model-basedExpand
Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
TLDR
A typology of methods where deep RL algorithms are trained to tackle the developmental robotics problem of the autonomous acquisition of open-ended repertoires of skills is proposed at the intersection of deep RL and developmental approaches. Expand
Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark
TLDR
The design of a centralized benchmark for Reinforcement Learning which can help measure Sample Efficiency and Generalization in Reinforcement learning by doing end to end evaluation of the training and rollout phases of thousands of user submitted code bases in a scalable way is presented. Expand
Stabilizing Transformers for Reinforcement Learning
TLDR
The proposed architecture, the Gated Transformer-XL (GTrXL), surpasses LSTMs on challenging memory environments and achieves state-of-the-art results on the multi-task DMLab-30 benchmark suite, exceeding the performance of an external memory architecture. Expand

References

SHOWING 1-10 OF 44 REFERENCES
An investigation of model-free planning
TLDR
It is demonstrated empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. Expand
BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning
TLDR
The BabyAI research platform is introduced to support investigations towards including humans in the loop for grounded language learning and puts forward strong evidence that current deep learning methods are not yet sufficiently sample efficient when it comes to learning a language with compositional properties. Expand
BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop
TLDR
The BabyAI research platform is introduced to support investigations towards including humans in the loop for grounded language learning and puts forward strong evidence that current deep learning methods are not yet sufficiently sample efficient when it comes to learning a language with compositional properties. Expand
Relational recurrent neural networks
TLDR
A new memory module -- a \textit{Relational Memory Core} (RMC) -- is used which employs multi-head dot product attention to allow memories to interact and achieves state-of-the-art results on the WikiText-103, Project Gutenberg, and GigaWord datasets. Expand
Control of Memory, Active Perception, and Action in Minecraft
TLDR
These tasks are designed to emphasize, in a controllable manner, issues that pose challenges for RL methods including partial observability, delayed rewards, high-dimensional visual observations, and the need to use active perception in a correct manner so as to perform well in the tasks. Expand
A Recurrent Latent Variable Model for Sequential Data
TLDR
It is argued that through the use of high-level latent random variables, the variational RNN (VRNN)1 can model the kind of variability observed in highly structured sequential data such as natural speech. Expand
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
TLDR
Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases. Expand
Recurrent Independent Mechanisms
TLDR
Recurrent Independent Mechanisms is proposed, a new recurrent architecture in which multiple groups of recurrent cells operate with nearly independent transition dynamics, communicate only sparingly through the bottleneck of attention, and are only updated at time steps where they are most relevant. Expand
RTFM: Generalising to Novel Environment Dynamics via Reading
TLDR
This work proposes a grounded policy learning problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason over a language goal, relevant dynamics described in a document, and environment observations, and procedurally generate environment dynamics and corresponding language descriptions of the dynamics. Expand
...
1
2
3
4
5
...