Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure

@article{Hupkes2018VisualisationA,
  title={Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure},
  author={Dieuwke Hupkes and Willem H. Zuidema},
  journal={J. Artif. Intell. Res.},
  year={2018},
  volume={61},
  pages={907-926}
}
In this paper, we investigate how recurrent neural networks can learn and process languages with hierarchical, compositional semantics. To this end, we define the artificial task of processing nested arithmetic expressions, and study whether different types of neural networks can learn to compute their meaning. We find that simple recurrent networks cannot find a generalising solution to this task, but gated recurrent neural networks perform surprisingly well: networks learn to predict the… 

Diagnostic classification and symbolic guidance to understand and improve recurrent neural networks

A search through a variety of methods to inspect and understand the internal dynamics of gated recurrent neural networks, using a task focusing on a key feature of language: hierarchical compositionality of meaning, produces a detailed understanding of the computations implemented by the networks to execute their task.

Discovering the Compositional Structure of Vector Representations with Role Learning Networks

A novel analysis technique called ROLE is used to show that recurrent neural networks perform well on compositional tasks by converging to solutions which implicitly represent symbolic structure, and uncovers a symbolic structure which closely approximates the encodings of a standard seq2seq network trained to perform the compositional SCAN task.

Do Neural Models Learn Systematicity of Monotonicity Inference in Natural Language?

A method for evaluating whether neural models can learn systematicity of monotonicity inference in natural language, namely, the regularity for performing arbitrary inferences with generalization on composition is introduced.

Understanding Memory Modules on Learning Simple Algorithms

This work applies a two-step analysis pipeline consisting of first inferring hypothesis about what strategy the model has learned according to visualization and then verifying it by a novel proposed qualitative analysis method based on dimension reduction.

Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information

It is shown that ‘diagnostic classifiers’, trained to predict number from the internal states of a language model, provide a detailed understanding of how, when, and where this information is represented, and this knowledge can be used to improve their performance.

USING MULTIPLICATIVE DIFFERENTIAL NEURAL UNITS

The effectiveness of the proposed higher-order, memoryaugmented recursive-NN models on two challenging mathematical equation tasks are demonstrated, showing improved extrapolation, stable performance, and faster convergence.

Analysing Representations of Memory Impairment in a Clinical Notes Classification Model

This method reveals the types of sentences that lead the model to make incorrect diagnoses and identifies clusters of sentences in the embedding space that correlate strongly with importance scores for each clinical diagnosis class.

On the Realization of Compositionality in Neural Networks

It is confirmed that the models with attentive guidance indeed infer more compositional solutions than the baseline, and analysis of the structural differences between the two model types indicates that guided networks exhibit a more modular structure with a small number of specialized, strongly connected neurons.

Predicting Inductive Biases of Pre-Trained Models

The hypothesis that the extent to which a feature influences a model’s decisions can be predicted using a combination of two factors: the feature’'s extractability after pre-training, and the evidence available during finetuning is tested.

Can RNNs learn Recursive Nested Subject-Verb Agreements?

A new framework to study recursive processing in RNNs is presented, using subject-verb agreement as a probe into the representations of the neural network, which indicates how neural networks may extract bounded nested tree structures, without learning a systematic recursive rule.
...

References

SHOWING 1-10 OF 42 REFERENCES

Diagnostic Classifiers Revealing how Neural Networks Process Hierarchical Structure

It is found that recursive neural networks can implement a generalising solution, and it is shown that gated recurrent neural networks, which process the expressions incrementally, perform surprisingly well on this task: they learn to predict the outcome of the arithmetic expressions with reasonable accuracy, although performance deteriorates with increasing length.

Diagnostic classification and symbolic guidance to understand and improve recurrent neural networks

A search through a variety of methods to inspect and understand the internal dynamics of gated recurrent neural networks, using a task focusing on a key feature of language: hierarchical compositionality of meaning, produces a detailed understanding of the computations implemented by the networks to execute their task.

Recursive Neural Networks Can Learn Logical Semantics

This work generates artificial data from a logical grammar and uses it to evaluate the models' ability to learn to handle basic relational reasoning, recursive structures, and quantification, suggesting that they can learn suitable representations for logical inference in natural language.

Visualisation and ‘diagnostic classifiers’ reveal how recurrent and recursive neural networks process hierarchical structure

We investigate how neural networks can learn and process languages with hierarchical, compositional semantics. To this end, we define the artifical task of processing nested arithmetic expressions,...

Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks

This work presents LSTMVis a visual analysis tool for recurrent neural networks with a focus on understanding these hidden state dynamics and shows several use cases of the tool for analyzing specific hidden state properties on datasets containing nesting, phrase structure, and chord progressions.

Visualizing and Understanding Neural Models in NLP

Four strategies for visualizing compositionality in neural models for NLP, inspired by similar work in computer vision, including LSTM-style gates that measure information flow and gradient back-propagation, are described.

Visualizing and Understanding Recurrent Networks

This work uses character-level language models as an interpretable testbed to provide an analysis of LSTM representations, predictions and error types, and reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets.

Learning task-dependent distributed representations by backpropagation through structure

  • C. GollerA. Küchler
  • Computer Science
    Proceedings of International Conference on Neural Networks (ICNN'96)
  • 1996
A connectionist architecture together with a novel supervised learning scheme which is capable of solving inductive inference tasks on complex symbolic structures of arbitrary size is presented.

The Forest Convolutional Network: Compositional Distributional Semantics with a Neural Chart and without Binarization

A new model, the Forest Convolutional Network, is introduced that avoids all of the challenges of current recursive neural network approaches for computing sentence meaning, by taking a parse forest as input, rather than a single tree, and by allowing arbitrary branching factors.

Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks

A recursive neural network architecture for jointly parsing natural language and learning vector space representations for variable-sized inputs and captures semantic information: For instance, the phrases “decline to comment” and “would not disclose the terms” are close by in the induced embedding space.