Corpus ID: 216553696

Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

  title={Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias},
  author={J. Vig and Sebastian Gehrmann and Yonatan Belinkov and Sharon Qian and Daniel Nevo and Y. Singer and S. Shieber},
  • J. Vig, Sebastian Gehrmann, +4 authors S. Shieber
  • Published 2020
  • Computer Science
  • ArXiv
  • Common methods for interpreting neural models in natural language processing typically examine either their structure or their behavior, but not both. We propose a methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. It enables us to analyze the mechanisms by which information flows from input to output through various model components, known as mediators. We apply this methodology to analyze gender bias… CONTINUE READING
    What Does My QA Model Know? Devising Controlled Probes Using Expert Knowledge
    • 10
    • PDF
    When Bert Forgets How To POS: Amnesic Probing of Linguistic Properties and MLM Predictions
    • 3
    • PDF
    CausaLM: Causal Model Explanation Through Counterfactual Language Models
    • 1
    • Highly Influenced
    • PDF


    Publications referenced by this paper.
    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    • 10,161
    • PDF
    Attention is All you Need
    • 11,840
    • PDF
    A General Approach to Causal Mediation Analysis
    • 1,460
    • PDF
    Language Models are Unsupervised Multitask Learners
    • 1,776
    • PDF
    Identification, Inference and Sensitivity Analysis for Causal Mediation Effects
    • 853
    • PDF
    Identifiability and Exchangeability for Direct and Indirect Effects
    • 1,247
    • PDF
    Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
    • 801
    • Highly Influential
    • PDF
    Visualizing and Understanding Neural Models in NLP
    • 374
    • PDF
    Conceptual issues concerning mediation, interventions and composition
    • 423
    • PDF
    Semantics derived automatically from language corpora contain human-like biases
    • 624
    • PDF