• Corpus ID: 232092291

Coordination Among Neural Modules Through a Shared Global Workspace

@article{Goyal2021CoordinationAN,
  title={Coordination Among Neural Modules Through a Shared Global Workspace},
  author={Anirudh Goyal and Aniket Didolkar and Alex Lamb and Kartikeya Badola and Nan Rosemary Ke and Nasim Rahaman and Jonathan Binas and Charles Blundell and Michael C. Mozer and Yoshua Bengio},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.01197}
}
Deep learning has seen a movement away from representing examples with a monolithic hidden state towards a richly structured state. For example, Transformers segment by position, and object-centric architectures decompose images into entities. In all these architectures, interactions between different elements are modeled via pairwise interactions: Transformers make use of self-attention to incorporate information from other positions and object-centric architectures make use of graph neural… 

Luna: Linear Unified Nested Attention

Luna is proposed, a linear unified nested attention mechanism that approximates softmax attention with two nested linear attention functions, yielding only linear time and space complexity.

A Short Survey of Systematic Generalization

This survey includes systematic generalization and aory of how machine learning addresses it, and looks into sys- tematic generalization in language, vision, and VQA fields.

Compositional Attention: Disentangling Search and Retrieval

This work proposes a novel attention mechanism, called Compositional Attention, that replaces the standard head structure, and demonstrates that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings.

Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

It is formally prove that the SEM representation leads to better generalization than an unnormalized representation and empirically demonstrate that SSL methods trained with SEMs have improved generalization on natural image datasets such as CIFAR-100 and ImageNet.

Continual Learning via Local Module Composition

This work introduces local module composition (LMC), an approach to modular CL where each module is provided a local structural component that estimates a module’s relevance to the input, and demonstrates that agnosticity to task identities (IDs) arises from (local) structural learning that is module-specific as opposed to the task- and/or model-species as in previous works.

Stateful active facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning

We evaluate Our Efficient coordination among the agents further aggravates the problem of learning in multi-agent settings.

Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning

The proposed approach hopes to gain the expressiveness of the Transformer, while encourag-ing better compression and structuring of representations in the slow stream and shows the benefits of the proposed method in terms of improved sampleency and generalization performance as compared to various competitive baselines.

Coordinating Policies Among Multiple Agents via an Intelligent Communication Channel

This paper proposes an alternative approach whereby agents communicate through an intelligent facilitator that learns to sift through and interpret signals provided by all agents to improve the agents’ collective performance.

Discrete-Valued Neural Communication in Structured Architectures Enhances Generalization

The experiments show that discrete-valued neural communication (DVNC) substantially improves systematic generalization in a variety of architectures—transformers, modular architectures, and graph neural networks, and the DVNC is robust to the choice of hyperparameters, making the method useful in practice.

Discrete-Valued Neural Communication

The hypothesis that restricting the transmitted information among components to discrete representations is a beneficial bottleneck is explored and a theoretical justification of the discretization process is established, proving that it has the ability to increase noise robustness and reduce the underlying dimensionality of the model.

References

SHOWING 1-10 OF 64 REFERENCES

Relational recurrent neural networks

A new memory module -- a \textit{Relational Memory Core} (RMC) -- is used which employs multi-head dot product attention to allow memories to interact and achieves state-of-the-art results on the WikiText-103, Project Gutenberg, and GigaWord datasets.

Image Transformer

This work generalizes a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood, and significantly increases the size of images the model can process in practice, despite maintaining significantly larger receptive fields per layer than typical convolutional neural networks.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Transformers with Competitive Ensembles of Independent Mechanisms

This work proposes Transformers with Independent Mechanisms (TIM), a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention, and proposes a competition mechanism which encourages these mechanisms to specialize over time steps, and thus be more independent.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

PyTorch: An Imperative Style, High-Performance Deep Learning Library

This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

The StarCraft Multi-Agent Challenge

The StarCraft Multi-Agent Challenge (SMAC), based on the popular real-time strategy game StarCraft II, is proposed as a benchmark problem and an open-source deep multi-agent RL learning framework including state-of-the-art algorithms is opened.

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks

This work presents an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set, and reduces the computation time of self-attention from quadratic to linear in the number of Elements in the set.

What is consciousness, and could machines have it?

It is argued that despite their recent successes, current machines are still mostly implementing computations that reflect unconscious processing in the human brain, and the word “consciousness” conflates two different types of information-processing computations in the brain.
...