• Publications
  • Influence
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
TLDR
This work presents Iterative Null-space Projection (INLP), a novel method for removing information from neural representations based on repeated training of linear classifiers that predict a certain property the authors aim to remove, followed by projection of the representations on their null-space.
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset of them) are being modified. We show that with small-to-medium training data, applying BitFit on
Measuring and Improving Consistency in Pretrained Language Models
TLDR
The creation of PARAREL, a high-quality resource of cloze-style query English paraphrases, and analysis of the representational spaces of PLMs suggest that they have a poor structure and are currently not suitable for representing knowledge in a robust way.
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals
TLDR
The inability to infer behavioral conclusions from probing results is pointed out, and an alternative method that focuses on how the information is being used is offered, rather than on what information is encoded is offered.
Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages
TLDR
A paradigm is proposed that creates synthetic versions of English, which differ from English in one or more typological parameters, and generates corpora for those languages based on a parsed English corpus, and finds overt morphological case makes agreement prediction significantly easier, regardless of word order.
Ab Antiquo: Proto-language Reconstruction with RNNs
TLDR
This work addresses the task of proto-word reconstruction, in which the model is exposed to cognates in contemporary daughter languages, and has to predict the proto word in the ancestor language, and shows that neural sequence models outperform conventional methods applied to this task so far.
Can LSTM Learn to Capture Agreement? The Case of Basque
TLDR
It is found that sequential models perform worse on agreement prediction in Basque than one might expect on the basis of a previous agreement prediction work in English.
When Bert Forgets How To POS: Amnesic Probing of Linguistic Properties and MLM Predictions
TLDR
The inability to infer behavioral conclusions from probing results is pointed out, and an alternative method which is focused on how the information is being used is offered, rather than on what information is encoded is offered.
It’s not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT
TLDR
The hypothesis that multilingual BERT learns representations which contain both a language-encoding component and an abstract, cross-lingual component is tested, and an empirical language-identity subspace within mBERT representations is identified.
Unsupervised Distillation of Syntactic Information from Contextualized Word Representations
TLDR
This work aims to learn a transformation of the contextualized vectors, that discards the lexical semantics, but keeps the structural information in the vectors, and automatically generates groups of sentences which are structurally similar but semantically different.
...
...