On the Realization of Compositionality in Neural Networks
@article{Baan2019OnTR, title={On the Realization of Compositionality in Neural Networks}, author={Joris Baan and Jana Leible and Mitja Nikolaus and David Rau and Dennis Ulmer and Tim Baumg{\"a}rtner and Dieuwke Hupkes and Elia Bruni}, journal={ArXiv}, year={2019}, volume={abs/1906.01634} }
We present a detailed comparison of two types of sequence to sequence models trained to conduct a compositional task. The models are architecturally identical at inference time, but differ in the way that they are trained: our baseline model is trained with a task-success signal only, while the other model receives additional supervision on its attention mechanism (Attentive Guidance), which has shown to be an effective method for encouraging more compositional solutions. We first confirm that…
Figures and Tables from this paper
14 Citations
Transcoding Compositionally: Using Attention to Find More Generalizable Solutions
- Computer ScienceBlackboxNLP@ACL
- 2019
This paper presents seq2attn, a new architecture that is specifically designed to exploit attention to find compositional patterns in the input, and exhibits overgeneralization to a larger degree than a standard sequence-to-sequence model.
Recursive Decoding: A Situated Cognition Approach to Compositional Generation in Grounded Language Understanding
- Computer ScienceArXiv
- 2022
Recursive Decoding (RD) is presented, a novel procedure for training and using seq2seq models, targeted towards decode-side generalization, which yields dramatic improvement on two previously neglected generalization tasks in gSCAN.
THE UNIVERSITY OF CHICAGO ASSESSING REPRESENTATION AND COMPOSITION IN DEEP LANGUAGE MODELS A PROPOSAL SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCE IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
- Computer Science
- 2020
This paper proposes various intrinsic and extrinsic tasks, targeting at removing superficial cues and isolating abstract composition signals, and finds that phrasal composition in these models relies heavily on wordcontent, and preventing inference on word content degrades the performance to slightly above chance.
Compositionality as Directional Consistency in Sequential Neural Networks
- Computer Science
- 2019
An exploratory study comparing the abilities of SRNs and GRUs to make compositional generalizations, using adjective semantics as testing ground demonstrates that GRUs generalize more systematically than SRNs.
Compositionality as Directional Consistency in Sequential Neural Networks
- Computer Science
- 2019
An exploratory study comparing the abilities of SRNs and GRUs to make compositional generalizations, using adjective semantics as testing ground demonstrates that GRUs generalize more systematically than SRNs.
THE UNIVERSITY OF CHICAGO ASSESSING COMPOSITION IN DEEP LANGUAGE MODELS A PROPOSAL SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCE IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
- Computer Science
- 2020
This paper proposes various intrinsic and extrinsic tasks, targeting at removing superficial cues and isolating abstract composition signals, and finds that phrasal composition in these models relies heavily on wordcontent, and preventing inference on word content degrades the performance to slightly above chance.
On the Interplay Between Fine-tuning and Composition in Transformers
- Computer ScienceFINDINGS
- 2021
It is found that fine-tuning largely fails to benefit compositionality in these representations, though training on sentiment yields a small, localized benefit for certain models.
Compositional memory in attractor neural networks with one-step learning
- Computer ScienceNeural Networks
- 2021
Compositionality and Generalization In Emergent Languages
- EducationACL
- 2020
It is concluded that compositionality does not arise from simple generalization pressure, but if an emergent language does chance upon it, it will be more likely to survive and thrive.
Assessing Phrasal Representation and Composition in Transformers
- Computer ScienceEMNLP
- 2020
It is found that phrase representation in state-of-the-art pre-trained transformers relies heavily on word content, with little evidence of nuanced composition.
References
SHOWING 1-10 OF 16 REFERENCES
Learning compositionally through attentive guidance
- Computer ScienceArXiv
- 2018
Attentive Guidance, a mechanism to direct a sequence to sequence model equipped with attention to find more compositional solutions, is introduced, and it is shown that vanilla sequence tosequence models with attention overfit the training distribution, while the guided versions come up with Compositional solutions that fit the training and testing distributions almost equally well.
Transcoding Compositionally: Using Attention to Find More Generalizable Solutions
- Computer ScienceBlackboxNLP@ACL
- 2019
This paper presents seq2attn, a new architecture that is specifically designed to exploit attention to find compositional patterns in the input, and exhibits overgeneralization to a larger degree than a standard sequence-to-sequence model.
Memorize or generalize? Searching for a compositional RNN in a haystack
- Computer ScienceArXiv
- 2018
This paper proposes the lookup table composition domain as a simple setup to test compositional behaviour and shows that it is theoretically possible for a standard RNN to learn to behave compositionally in this domain when trained with standard gradient descent and provided with additional supervision.
Diagnostic classification and symbolic guidance to understand and improve recurrent neural networks
- Computer Science
- 2017
A search through a variety of methods to inspect and understand the internal dynamics of gated recurrent neural networks, using a task focusing on a key feature of language: hierarchical compositionality of meaning, produces a detailed understanding of the computations implemented by the networks to execute their task.
What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models
- Computer Science, BiologyAAAI
- 2019
A comprehensive analysis of neurons and proposes two methods: Linguistic Correlation Analysis, based on a supervised method to extract the most relevant neurons with respect to an extrinsic task, and Cross-model Correlation analysis, an unsupervised method to Extract salient neurons w.r.t. the model itself.
Visualizing and Understanding Recurrent Networks
- Computer ScienceArXiv
- 2015
This work uses character-level language models as an interpretable testbed to provide an analysis of LSTM representations, predictions and error types, and reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets.
A Unified Approach to Interpreting Model Predictions
- Computer ScienceNIPS
- 2017
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure
- Computer ScienceIJCAI
- 2018
The results indicate that the networks follow a strategy similar to the hypothesised ‘cumulative strategy’, which explains the high accuracy of the network on novel expressions, the generalisation to longer expressions than seen in training, and the mild deterioration with increasing length.
Probing the Compositionality of Intuitive Functions
- Computer ScienceNIPS
- 2016
It is argued that the compositional nature of intuitive functions is consistent with broad principles of human cognition, and shown that participants prefer compositional over non-compositional function extrapolations, and that samples from the human prior over functions are best described by a compositional model.
Adam: A Method for Stochastic Optimization
- Computer ScienceICLR
- 2015
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.