Topics to Avoid: Demoting Latent Confounds in Text Classification

@inproceedings{Kumar2019TopicsTA,
  title={Topics to Avoid: Demoting Latent Confounds in Text Classification},
  author={Sachin Kumar and Shuly Wintner and Noah A. Smith and Yulia Tsvetkov},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
  year={2019}
}
Despite impressive performance on many text classification tasks, deep neural networks tend to learn frequent superficial patterns that are specific to the training data and do not always generalize well. In this work, we observe this limitation with respect to the task of native language identification. We find that standard text classifiers which perform well on the test set end up learning topical features which are confounds of the prediction task (e.g., if the input text mentions Sweden… 

Figures and Tables from this paper

Demoting Racial Bias in Hate Speech Detection

Experimental results suggest that the adversarial training method used in this paper is able to substantially reduce the false positive rate for AAE text while only minimally affecting the performance of hate speech classification.

H OW TO KEEP TEXT PRIVATE ? A SYSTEMATIC REVIEW OF DEEP LEARNING METHODS FOR PRIVACY - PRESERVING NATURAL LANGUAGE PROCESSING

A novel taxonomy for classifying the existing privacy-preserving NLP methods into three categories: data safeguarding methods, trusted methods, and verification methods is introduced, and an extensive summary of privacy threats, datasets for applications, and metrics for privacy evaluation is presented.

Demoting the Lead Bias in News Summarization via Alternating Adversarial Learning

Experiments show that the novel technique introduced can effectively demote the model’s learned lead bias and improve its generality on out-of-distribution data, with little to no performance loss on in-dist distribution data.

How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing

A novel taxonomy for classifying the existing privacy-preserving NLP methods into three categories: data safeguarding methods, trusted methods, and verification methods is introduced, and an extensive summary of privacy threats, datasets for applications, and metrics for privacy evaluation is presented.

Dataset Geography: Mapping Language Data to Language Users

This work uses entity recognition and linking systems to quantify if and by how much do NLP datasets match the expected needs of the language speakers, and explores some geographical and economic factors that may explain the observed dataset distributions.

Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates

In a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data, significantly outperforming baseline methods that use adversarial training.

Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates

This work proposes influence tuning—a procedure that leverages model interpretations to update the model parameters towards a plausible interpretation (rather than an interpretation that relies on spurious patterns in the data) in addition to learning to predict the task labels.

Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching

To the knowledge, this work is the first to incorporate speaker characteristics in a neural model for code-switching, and more generally, takes a step towards developing transparent, personalized models that use speaker information in a controlled way.

Handling Bias in Toxic Speech Detection: A Survey

Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors such as the context, geography, socio-political climate, and background of the producers and consumers

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey

A survey of practical methods for addressing potential threats and societal harms from language generation models, drawing on several prior works' taxonomies of language model risks to present a structured overview of strategies for detecting and ameliorating different kinds of risks/harms of language generators.

References

SHOWING 1-10 OF 44 REFERENCES

Deconfounded Lexicon Induction for Interpretable Social Science

Two deep learning algorithms are introduced that are more predictive and less confound-related than those of standard feature weighting and lexicon induction techniques like regression and log odds and used to induce lexicons that are predictive of timely responses to consumer complaints, enrollment from course descriptions, and sales from product descriptions.

Generative Adversarial Nets

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a

Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict

A variety of techniques for selecting words that capture partisan, or other, differences in political speech and for evaluating the relative importance of those words are discussed and several new approaches based on Bayesian shrinkage and regularization are introduced.

A Report on the 2017 Native Language Identification Shared Task

The fusion track showed that combining the written and spoken responses provides a large boost in prediction accuracy, and multiple classifier systems were the most effective in all tasks, with most based on traditional classifiers with lexical/syntactic features.

Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations

An adversarial training procedure is used to remove information about the sensitive attribute from the latent representation learned by a neural network, and the data distribution empirically drives the adversary's notion of fairness.

Domain-Adversarial Training of Neural Networks

A new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions, which can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer.

Attention is not not Explanation

It is shown that even when reliable adversarial distributions can be found, they don’t perform well on the simple diagnostic, indicating that prior work does not disprove the usefulness of attention mechanisms for explainability.

Saliency-driven Word Alignment Interpretation for Neural Machine Translation

This paper shows that NMT models do learn interpretable word alignments, which could only be revealed with proper interpretation methods, and proposes a series of such methods that are model-agnostic, are able to be applied either offline or online, and do not require parameter update or architectural change.

Is Attention Interpretable?

While attention noisily predicts input components’ overall importance to a model, it is by no means a fail-safe indicator, and there are many ways in which this does not hold, where gradient-based rankings of attention weights better predict their effects than their magnitudes.

The International Corpus of Learner English: A New Resource for Foreign Language Learning and Teaching and Second Language Acquisition Research

N In the late 1950s, when corpus linguistics made its debut on the linguistic scene, it was a very modest enterprise in the hands of a small group of enthusiasts. Looking back on this period, Leech