On the Realization of Compositionality in Neural Networks

@article{Baan2019OnTR,
  title={On the Realization of Compositionality in Neural Networks},
  author={Joris Baan and Jana Leible and Mitja Nikolaus and David Rau and Dennis Ulmer and Tim Baumg{\"a}rtner and Dieuwke Hupkes and Elia Bruni},
  journal={ArXiv},
  year={2019},
  volume={abs/1906.01634}
}
We present a detailed comparison of two types of sequence to sequence models trained to conduct a compositional task. The models are architecturally identical at inference time, but differ in the way that they are trained: our baseline model is trained with a task-success signal only, while the other model receives additional supervision on its attention mechanism (Attentive Guidance), which has shown to be an effective method for encouraging more compositional solutions. We first confirm that… 

Figures and Tables from this paper

Transcoding Compositionally: Using Attention to Find More Generalizable Solutions
TLDR
This paper presents seq2attn, a new architecture that is specifically designed to exploit attention to find compositional patterns in the input, and exhibits overgeneralization to a larger degree than a standard sequence-to-sequence model.
Recursive Decoding: A Situated Cognition Approach to Compositional Generation in Grounded Language Understanding
TLDR
Recursive Decoding (RD) is presented, a novel procedure for training and using seq2seq models, targeted towards decode-side generalization, which yields dramatic improvement on two previously neglected generalization tasks in gSCAN.
THE UNIVERSITY OF CHICAGO ASSESSING REPRESENTATION AND COMPOSITION IN DEEP LANGUAGE MODELS A PROPOSAL SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCE IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
  • Computer Science
  • 2020
TLDR
This paper proposes various intrinsic and extrinsic tasks, targeting at removing superficial cues and isolating abstract composition signals, and finds that phrasal composition in these models relies heavily on wordcontent, and preventing inference on word content degrades the performance to slightly above chance.
Compositionality as Directional Consistency in Sequential Neural Networks
TLDR
An exploratory study comparing the abilities of SRNs and GRUs to make compositional generalizations, using adjective semantics as testing ground demonstrates that GRUs generalize more systematically than SRNs.
Compositionality as Directional Consistency in Sequential Neural Networks
TLDR
An exploratory study comparing the abilities of SRNs and GRUs to make compositional generalizations, using adjective semantics as testing ground demonstrates that GRUs generalize more systematically than SRNs.
THE UNIVERSITY OF CHICAGO ASSESSING COMPOSITION IN DEEP LANGUAGE MODELS A PROPOSAL SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCE IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
  • Computer Science
  • 2020
TLDR
This paper proposes various intrinsic and extrinsic tasks, targeting at removing superficial cues and isolating abstract composition signals, and finds that phrasal composition in these models relies heavily on wordcontent, and preventing inference on word content degrades the performance to slightly above chance.
On the Interplay Between Fine-tuning and Composition in Transformers
TLDR
It is found that fine-tuning largely fails to benefit compositionality in these representations, though training on sentiment yields a small, localized benefit for certain models.
Compositional memory in attractor neural networks with one-step learning
Compositionality and Generalization In Emergent Languages
TLDR
It is concluded that compositionality does not arise from simple generalization pressure, but if an emergent language does chance upon it, it will be more likely to survive and thrive.
Assessing Phrasal Representation and Composition in Transformers
TLDR
It is found that phrase representation in state-of-the-art pre-trained transformers relies heavily on word content, with little evidence of nuanced composition.
...
1
2
...

References

SHOWING 1-10 OF 16 REFERENCES
Learning compositionally through attentive guidance
TLDR
Attentive Guidance, a mechanism to direct a sequence to sequence model equipped with attention to find more compositional solutions, is introduced, and it is shown that vanilla sequence tosequence models with attention overfit the training distribution, while the guided versions come up with Compositional solutions that fit the training and testing distributions almost equally well.
Transcoding Compositionally: Using Attention to Find More Generalizable Solutions
TLDR
This paper presents seq2attn, a new architecture that is specifically designed to exploit attention to find compositional patterns in the input, and exhibits overgeneralization to a larger degree than a standard sequence-to-sequence model.
Memorize or generalize? Searching for a compositional RNN in a haystack
TLDR
This paper proposes the lookup table composition domain as a simple setup to test compositional behaviour and shows that it is theoretically possible for a standard RNN to learn to behave compositionally in this domain when trained with standard gradient descent and provided with additional supervision.
Diagnostic classification and symbolic guidance to understand and improve recurrent neural networks
TLDR
A search through a variety of methods to inspect and understand the internal dynamics of gated recurrent neural networks, using a task focusing on a key feature of language: hierarchical compositionality of meaning, produces a detailed understanding of the computations implemented by the networks to execute their task.
What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models
TLDR
A comprehensive analysis of neurons and proposes two methods: Linguistic Correlation Analysis, based on a supervised method to extract the most relevant neurons with respect to an extrinsic task, and Cross-model Correlation analysis, an unsupervised method to Extract salient neurons w.r.t. the model itself.
Visualizing and Understanding Recurrent Networks
TLDR
This work uses character-level language models as an interpretable testbed to provide an analysis of LSTM representations, predictions and error types, and reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets.
A Unified Approach to Interpreting Model Predictions
TLDR
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure
TLDR
The results indicate that the networks follow a strategy similar to the hypothesised ‘cumulative strategy’, which explains the high accuracy of the network on novel expressions, the generalisation to longer expressions than seen in training, and the mild deterioration with increasing length.
Probing the Compositionality of Intuitive Functions
TLDR
It is argued that the compositional nature of intuitive functions is consistent with broad principles of human cognition, and shown that participants prefer compositional over non-compositional function extrapolations, and that samples from the human prior over functions are best described by a compositional model.
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
...
1
2
...