• Publications
  • Influence
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
TLDR
This work presents Iterative Null-space Projection (INLP), a novel method for removing information from neural representations based on repeated training of linear classifiers that predict a certain property the authors aim to remove, followed by projection of the representations on their null-space. Expand
oLMpics-On What Language Model Pre-training Captures
TLDR
This work proposes eight reasoning tasks, which conceptually require operations such as comparison, conjunction, and composition, and findings can help future work on designing new datasets, models, and objective functions for pre-training. Expand
Adversarial Removal of Demographic Attributes from Text Data
TLDR
It is shown that demographic information of authors is encoded in—and can be recovered from—the intermediate representations learned by text-based neural classifiers, and the implication is that decisions of classifiers trained on textual data are not agnostic to—and likely condition on—demographic attributes. Expand
Evaluating Models’ Local Decision Boundaries via Contrast Sets
TLDR
A more rigorous annotation paradigm for NLP that helps to close systematic gaps in the test data, and recommends that the dataset authors manually perturb the test instances in small but meaningful ways that (typically) change the gold label, creating contrast sets. Expand
Measuring and Improving Consistency in Pretrained Language Models
TLDR
The creation of PARAREL, a high-quality resource of cloze-style query English paraphrases, and analysis of the representational spaces of PLMs suggest that they have a poor structure and are currently not suitable for representing knowledge in a robust way. Expand
When Bert Forgets How To POS: Amnesic Probing of Linguistic Properties and MLM Predictions
TLDR
The inability to infer behavioral conclusions from probing results is pointed out, and an alternative method which is focused on how the information is being used is offered, rather than on what information is encoded is offered. Expand
Evaluating NLP Models via Contrast Sets
TLDR
A new annotation paradigm for NLP is proposed that helps to close systematic gaps in the test data, and it is recommended that after a dataset is constructed, the dataset authors manually perturb the test instances in small but meaningful ways that change the gold label, creating contrast sets. Expand
How Large Are Lions? Inducing Distributions over Quantitative Attributes
TLDR
This work proposes an unsupervised method for collecting quantitative information from large amounts of web data, and uses it to create a new, very large resource consisting of distributions over physical quantities associated with objects, adjectives, and verbs which it calls Distributions over Quantitative (DoQ). Expand
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals
TLDR
The inability to infer behavioral conclusions from probing results is pointed out, and an alternative method that focuses on how the information is being used is offered, rather than on what information is encoded is offered. Expand
Adversarial Removal of Demographic Attributes Revisited
TLDR
It is shown that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples, indicating that it relies on correlations specific to their particular data sample. Expand
...
1
2
3
...