Disentangling Representations of Text by Masking Transformers

  title={Disentangling Representations of Text by Masking Transformers},
  author={Xiongyi Zhang and Jan-Willem van de Meent and Byron C. Wallace},
Representations from large pretrained models such as BERT encode a range of features into monolithic vectors, affording strong predictive accuracy across a range of downstream tasks. In this paper we explore whether it is possible to learn disentangled representations by identifying existing subnetworks within pretrained models that encode distinct, complementary aspects. Concretely, we learn binary masks over transformer weights or hidden units to uncover subsets of features that correlate… 
1 Citations
Rational Design Inspired Application of Natural Language Processing Algorithms to Red Shift mNeptune684
This NLP paradigm is used in a protein engineering effort to further red shift the emission wavelength of the red fluorescent protein mNeptune684 using only a small number of functional training variants ('Low-N' scenario).


Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations
This paper theoretically shows that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data, and trains more than 12000 models covering most prominent methods and evaluation metrics on seven different data sets.
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.
Towards a Definition of Disentangled Representations
It is suggested that those transformations that change only some properties of the underlying world state, while leaving all other properties invariant are what gives exploitable structure to any kind of data.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Unsupervised Distillation of Syntactic Information from Contextualized Word Representations
This work aims to learn a transformation of the contextualized vectors, that discards the lexical semantics, but keeps the structural information in the vectors, and automatically generates groups of sentences which are structurally similar but semantically different.
What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties
10 probing tasks designed to capture simple linguistic features of sentences are introduced and used to study embeddings generated by three different encoders trained in eight distinct ways, uncovering intriguing properties of bothencoders and training methods.
A Framework for the Quantitative Evaluation of Disentangled Representations
A framework for the quantitative evaluation of disentangled representations when the ground-truth latent structure is available is proposed and three criteria are explicitly defined and quantified to elucidate the quality of learnt representations and thus compare models on an equal basis.
Learning Disentangled Representations of Texts with Application to Biomedical Abstracts
It is shown that the method learns representations that encode these clinically salient aspects, and that these can be effectively used to perform aspect-specific retrieval in experiments on two multi-aspect review corpora.
Disentangled Representation Learning for Non-Parallel Text Style Transfer
A simple yet effective approach is proposed, which incorporates auxiliary multi-task and adversarial objectives, for style prediction and bag-of-words prediction, respectively, and this disentangled latent representation learning can be applied to style transfer on non-parallel corpora.
On the Fairness of Disentangled Representations
Analyzing the representations of more than 10,000 trained state-of-the-art disentangled models, it is observed that several disentanglement scores are consistently correlated with increased fairness, suggesting that disENTanglement may be a useful property to encourage fairness when sensitive variables are not observed.