Transcoding Compositionally: Using Attention to Find More Generalizable Solutions

@inproceedings{Korrel2019TranscodingCU,
  title={Transcoding Compositionally: Using Attention to Find More Generalizable Solutions},
  author={Kris Korrel and Dieuwke Hupkes and Verna Dankers and Elia Bruni},
  booktitle={BlackboxNLP@ACL},
  year={2019}
}
While sequence-to-sequence models have shown remarkable generalization power across several natural language tasks, their construct of solutions are argued to be less compositional than human-like generalization. In this paper, we present seq2attn, a new architecture that is specifically designed to exploit attention to find compositional patterns in the input. In seq2attn, the two standard components of an encoder-decoder model are connected via a transcoder, that modulates the information… 

Figures and Tables from this paper

On the Realization of Compositionality in Neural Networks
TLDR
It is confirmed that the models with attentive guidance indeed infer more compositional solutions than the baseline, and analysis of the structural differences between the two model types indicates that guided networks exhibit a more modular structure with a small number of specialized, strongly connected neurons.
The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study
TLDR
This work re-instantiate three compositionality tests from the literature and reformulate them for neural machine translation (NMT) to rethink the evaluation of compositionality in neural networks and develop benchmarks using real data to evaluate compositionality on natural language, where composing meaning is not as straightforward as doing the math.
Assessing Incrementality in Sequence-to-Sequence Models
TLDR
This work proposes three novel metrics to assess the behavior of RNNs with and without an attention mechanism and identifies key differences in the way the different model types process sentences.
Compositionality as Directional Consistency in Sequential Neural Networks
TLDR
An exploratory study comparing the abilities of SRNs and GRUs to make compositional generalizations, using adjective semantics as testing ground demonstrates that GRUs generalize more systematically than SRNs.
Compositionality as Directional Consistency in Sequential Neural Networks
TLDR
An exploratory study comparing the abilities of SRNs and GRUs to make compositional generalizations, using adjective semantics as testing ground demonstrates that GRUs generalize more systematically than SRNs.
How BPE Affects Memorization in Transformers
TLDR
It is demonstrated that the size of the subword vocabulary learned by Byte-Pair Encoding greatly affects both ability and tendency of standard Transformer models to memorize training data, even when the authors control for the number of learned parameters.
LSTMS Compose — and Learn — Bottom-Up
TLDR
These synthetic experiments support a specific hypothesis about how hierarchical structures are discovered over the course of training: that LSTM constituent representations are learned bottom-up, relying on effective representations of their shorter children, rather than on learning the longer-range relations independently.
The compositionality of neural networks: integrating symbolism and connectionism
TLDR
A set of tests that provide a bridge between the vast amount of linguistic and philosophical theory about compositionality and the successful neural models of language and apply the resulting tests to three popular sequence-to-sequence models.
Paired Examples as Indirect Supervision in Latent Decision Models
TLDR
A way to leverage paired examples that provide stronger cues for learning latent decisions and improves both in- and out-of-distribution generalization and leads to correct latent decision predictions is introduced.
LATENT DECISION MODELS
  • Computer Science
  • 2020
TLDR
This work introduces a way to leverage paired examples that provide stronger cues for learning latent decisions and empirically demonstrates that the proposed approach improves both in and out-of-distribution generalization and leads to correct latent decision predictions.
...
1
2
...

References

SHOWING 1-10 OF 25 REFERENCES
Learning compositionally through attentive guidance
TLDR
Attentive Guidance, a mechanism to direct a sequence to sequence model equipped with attention to find more compositional solutions, is introduced, and it is shown that vanilla sequence tosequence models with attention overfit the training distribution, while the guided versions come up with Compositional solutions that fit the training and testing distributions almost equally well.
On the Realization of Compositionality in Neural Networks
TLDR
It is confirmed that the models with attentive guidance indeed infer more compositional solutions than the baseline, and analysis of the structural differences between the two model types indicates that guided networks exhibit a more modular structure with a small number of specialized, strongly connected neurons.
Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks
Systematic compositionality is the ability to recombine meaningful units with regular and predictable outcomes, and it’s seen as key to the human capacity for generalization in language. Recent work
Memorize or generalize? Searching for a compositional RNN in a haystack
TLDR
This paper proposes the lookup table composition domain as a simple setup to test compositional behaviour and shows that it is theoretically possible for a standard RNN to learn to behave compositionally in this domain when trained with standard gradient descent and provided with additional supervision.
Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks
TLDR
This paper introduces the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences, and tests the zero-shot generalization capabilities of a variety of recurrent neural networks trained on SCAN with sequence-to-sequence methods.
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Compositional Attention Networks for Machine Reasoning
TLDR
The MAC network is presented, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning that is computationally-efficient and data-efficient, in particular requiring 5x less data than existing models to achieve strong results.
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Can LSTM Learn to Capture Agreement? The Case of Basque
TLDR
It is found that sequential models perform worse on agreement prediction in Basque than one might expect on the basis of a previous agreement prediction work in English.
Human-level concept learning through probabilistic program induction
TLDR
A computational model is described that learns in a similar fashion and does so better than current deep learning algorithms and can generate new letters of the alphabet that look “right” as judged by Turing-like tests of the model's output in comparison to what real humans produce.
...
1
2
3
...