• Corpus ID: 244798694

Inducing Causal Structure for Interpretable Neural Networks

  title={Inducing Causal Structure for Interpretable Neural Networks},
  author={Atticus Geiger and Zhengxuan Wu and Hanson Lu and Josh Rozner and Elisa Kreiss and Thomas F. Icard and Noah D. Goodman and Christopher Potts},
In many areas, we have well-founded insights about causal structure that would be useful to bring into our trained models while still allowing them to learn in a data-driven fashion. To achieve this, we present the new method of interchange intervention training (IIT). In IIT, we (1) align variables in the causal model with representations in the neural model and (2) train a neural model to match the counterfactual behavior of the causal model on a base input when aligned representations in… 

Figures and Tables from this paper

Causal Distillation for Language Models
It is beneficial to augment distillation with a third objective that encourages the student to imitate the causal computation process of the teacher through interchange intervention training (IIT), which results in lower perplexity on Wikipedia (masked language modeling) and marked improvements on the GLUE benchmark.
Relational reasoning and generalization using non-symbolic neural networks
Findings indicate that neural models are able to solve equality-based reasoning tasks, suggesting that essential aspects of symbolic reasoning can emerge from data-driven, non-symbolic learning processes.
A Framework for Learning to Request Rich and Contextually Useful Information from Humans
A general interactive framework that enables an agent to determine and request contextually useful information from an assistant, and to incorporate rich forms of responses into its decision-making process is presented.


Causal Abstractions of Neural Networks
It is discovered that a BERT-based model with state-of-the-art performance successfully realizes parts of the natural logic model’s causal structure, whereas a simpler baseline model fails to show any such structure, demonstrating that BERT representations encode the compositional structure of MQNLI.
Compositional Attention Networks for Machine Reasoning
The MAC network is presented, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning that is computationally-efficient and data-efficient, in particular requiring 5x less data than existing models to achieve strong results.
Neural Network Attributions: A Causal Perspective
A new attribution method for neural networks developed using first principles of causality is proposed, and algorithms to efficiently compute the causal effects, as well as scale the approach to data with large dimensionality are proposed.
ReaSCAN: Compositional Reasoning in Language Grounding
This work proposes ReaSCAN, a benchmark dataset that builds off gSCAN but requires compositional language interpretation and reasoning about entities and relations, and assesses two models on Rea SCAN: a multi-modal baseline and a state-of-the-art graph convolutional neural model.
Axiomatic Attribution for Deep Networks
We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms— Sensitivity and
Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks
This paper introduces the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences, and tests the zero-shot generalization capabilities of a variety of recurrent neural networks trained on SCAN with sequence-to-sequence methods.
Learning the Difference that Makes a Difference with Counterfactually-Augmented Data
This paper focuses on natural language processing, introducing methods and resources for training models less sensitive to spurious patterns, and task humans with revising each document so that it accords with a counterfactual target label and retains internal coherence.
Pointer Value Retrieval: A new benchmark for understanding the limits of neural network generalization
This paper introduces a novel benchmark, Pointer Value Retrieval (PVR) tasks, that explore the limits of neural network generalization, and demonstrates that this task structure provides a rich testbed for understanding generalization.
What Does BERT Look at? An Analysis of BERT’s Attention
It is shown that certain attention heads correspond well to linguistic notions of syntax and coreference, and an attention-based probing classifier is proposed and used to demonstrate that substantial syntactic information is captured in BERT’s attention.
Posing Fair Generalization Tasks for Natural Language Inference
This paper defines and motivate a formal notion of fairness in this sense and applies these ideas to natural language inference by constructing very challenging but provably fair artificial datasets and showing that standard neural models fail to generalize in the required ways.