• Corpus ID: 244798694

Inducing Causal Structure for Interpretable Neural Networks

  title={Inducing Causal Structure for Interpretable Neural Networks},
  author={Atticus Geiger and Zhengxuan Wu and Hanson Lu and Josh Rozner and Elisa Kreiss and Thomas F. Icard and Noah D. Goodman and Christopher Potts},
In many areas, we have well-founded insights about causal structure that would be useful to bring into our trained models while still allowing them to learn in a data-driven fashion. To achieve this, we present the new method of interchange intervention training (IIT). In IIT, we (1) align variables in the causal model with representations in the neural model and (2) train a neural model to match the counterfactual behavior of the causal model on a base input when aligned representations in… 

Figures and Tables from this paper

Causal Distillation for Language Models
It is shown that it is beneficial to augment distillation with a third objective that encourages the student to imitate the causal dynamics of the teacher through a distillation interchange intervention training objective (DIITO).
Relational reasoning and generalization using non-symbolic neural networks
Findings indicate that neural models are able to solve equality-based reasoning tasks, suggesting that essential aspects of symbolic reasoning can emerge from data-driven, non-symbolic learning processes.
A Framework for Learning to Request Rich and Contextually Useful Information from Humans
A general interactive framework is presented that enables an agent to request and interpret rich, contextually useful information from an assistant that has knowledge about the task and the environment and demonstrates the practicality of the framework on a simulated human-assisted navigation problem.


Causal Abstractions of Neural Networks
It is discovered that a BERT-based model with state-of-the-art performance successfully realizes parts of the natural logic model’s causal structure, whereas a simpler baseline model fails to show any such structure, demonstrating that BERT representations encode the compositional structure of MQNLI.
Compositional Attention Networks for Machine Reasoning
The MAC network is presented, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning that is computationally-efficient and data-efficient, in particular requiring 5x less data than existing models to achieve strong results.
Neural Network Attributions: A Causal Perspective
A new attribution method for neural networks developed using first principles of causality is proposed, and algorithms to efficiently compute the causal effects, as well as scale the approach to data with large dimensionality are proposed.
ReaSCAN: Compositional Reasoning in Language Grounding
This work proposes ReaSCAN, a benchmark dataset that builds off gSCAN but requires compositional language interpretation and reasoning about entities and relations, and assesses two models on Rea SCAN: a multi-modal baseline and a state-of-the-art graph convolutional neural model.
Axiomatic Attribution for Deep Networks
We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms— Sensitivity and
Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation
It is found that models trained on general-purpose NLI datasets fail systematically on MoNLI examples containing negation, but that MoNNI fine-tuning addresses this failure, suggesting that the BERT model at least partially embeds a theory of lexical entailment and negation at an algorithmic level.
Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks
This paper introduces the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences, and tests the zero-shot generalization capabilities of a variety of recurrent neural networks trained on SCAN with sequence-to-sequence methods.
Causal Effects of Linguistic Properties
TextCause, an algorithm for estimating causal effects of linguistic properties, is introduced and it is shown that the proposed method outperforms related approaches when estimating the effect of Amazon review sentiment on semi-simulated sales figures.
Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?
This work examines this probing paradigm through a case study in Natural Language Inference, showing that models can learn to encode linguistic properties even if they are not needed for the task on which the model was trained, and identifies that pretrained word embeddings play a considerable role in encoding these properties.
Learning the Difference that Makes a Difference with Counterfactually-Augmented Data
This paper focuses on natural language processing, introducing methods and resources for training models less sensitive to spurious patterns, and task humans with revising each document so that it accords with a counterfactual target label and retains internal coherence.