A Novel Estimator of Mutual Information for Learning to Disentangle Textual Representations

@inproceedings{Colombo2021ANE,
  title={A Novel Estimator of Mutual Information for Learning to Disentangle Textual Representations},
  author={Pierre Colombo and Chlo{\'e} Clavel and Pablo Piantanida},
  booktitle={ACL},
  year={2021}
}
Learning disentangled representations of textual data is essential for many natural language tasks such as fair classification, style transfer and sentence generation, among others. The existent dominant approaches in the context of text data either rely on training an adversary (discriminator) that aims at making attribute values difficult to be inferred from the latent code or rely on minimising variational bounds of the mutual information between latent code and the value attribute. However… 
Learning Disentangled Textual Representations via Statistical Measures of Similarity
TLDR
This work introduces a family of regularizers for learning disentangled representations that do not require additional training, are faster and do not involve additional tuning while achieving better results both when combined with pretrained and randomly initialized text encoders.
What are the best systems? New perspectives on NLP Benchmarking
TLDR
This paper proposes a new procedure to rank systems based on their performance across different tasks, motivated by the social choice theory, and shows that this method yields different conclusions on stateof-the-art systems than the mean-aggregation procedure while being both more reliable and robust.
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation
TLDR
This paper introduces InfoLM a family of untrained metrics that can be viewed as a string-based metric that addresses the aforementioned flaws thanks to a pre-trained masked language model and makes use of information measures allowing the possibility to adapt InfoLM to different evaluation criteria.
Learning Disentangled Representations of Negation and Uncertainty
TLDR
This work attempts to disentangle the representations of negation, uncertainty, and content using a Variational Autoencoder, and finds that simply supervising the latent representations results in good disentanglement, but auxiliary objectives based on adversarial learning and mutual information minimization can provide additional disentangled gains.
KNIFE: Kernelized-Neural Differential Entropy Estimation
TLDR
KNIFE, a fully parameterized, differentiable kernel-based estimator of differential entropy, is introduced to address shortcomings in previously proposed estimators for DE and demonstrates the effectiveness of KNIFEbased estimation.
Beam Search with Bidirectional Strategies for Neural Response Generation
TLDR
Bidirectional strategies in searching paths by combining two networks (left-toright and right-to-left language models) making a bidirectional beam search possible are proposed and allows us using any similarity measure in the authors' sentence selection criterion.
Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders
TLDR
A Variational Autoencoder based method is proposed which models language features as discrete variables and encourages independence between variables for learning disentangled representations and outperforms continuous and discrete baselines on several qualitative and quantitative benchmarks.
Improving Multimodal fusion via Mutual Dependency Maximisation
TLDR
This work investigates unexplored penalties and proposes a set of new objectives that measure the dependency between modalities and demonstrates that the new penalties lead to a consistent improvement across a large variety of state-of-the-art models on two well-known sentiment analysis datasets: CMU-MOSI and CMU -MOSEI.
Code-switched inspired losses for generic spoken dialog representations
TLDR
This work introduces new pretraining losses tailored to learn generic multilingual spoken dialogue representations that achieve a better performance in both monolingual and multilingual settings.
...
...

References

SHOWING 1-10 OF 80 REFERENCES
Improving Disentangled Text Representation Learning with Information-Theoretic Guidance
TLDR
A novel method is proposed that effectively manifests disentangled representations of text, without any supervision on semantics, that induces style and content embeddings into two independent low-dimensional spaces.
Adversarial Removal of Demographic Attributes Revisited
TLDR
It is shown that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples, indicating that it relies on correlations specific to their particular data sample.
Disentangled Representation Learning for Non-Parallel Text Style Transfer
TLDR
A simple yet effective approach is proposed, which incorporates auxiliary multi-task and adversarial objectives, for style prediction and bag-of-words prediction, respectively, and this disentangled latent representation learning can be applied to style transfer on non-parallel corpora.
Learning Anonymized Representations with Adversarial Neural Networks
TLDR
A novel training objective for simultaneously training a predictor over target variables of interest (the regular labels) while preventing an intermediate representation to be predictive of the private labels is introduced.
Decomposing Textual Information For Style Transfer
TLDR
Using a framework of style transfer for texts, several empirical methods to assess information decomposition quality are proposed and validated with several state-of-the-art textual style transfer methods.
Style Transfer for Texts: Retrain, Report Errors, Compare with Rewrites
TLDR
This paper shows that standard assessment methodology for style transfer has several significant problems, and suggests taking BLEU between input and human-written reformulations into consideration for benchmarks, and proposes three new architectures that outperform state of the art in terms of this metric.
A General Class of Coefficients of Divergence of One Distribution from Another
Let P1 and P2 be two probability measures on the same space and let 0 be the generalized Radon-Nikodym derivative of P2 with respect to P1. If C is a continuous convex function of a real variable
Adversarial Removal of Demographic Attributes from Text Data
TLDR
It is shown that demographic information of authors is encoded in—and can be recovered from—the intermediate representations learned by text-based neural classifiers, and the implication is that decisions of classifiers trained on textual data are not agnostic to—and likely condition on—demographic attributes.
Style Transfer in Text: Exploration and Evaluation
TLDR
This work proposes two novel evaluation metrics that measure two aspects of style transfer: transfer strength and content preservation, and shows that the proposed content preservation metric is highly correlate to human judgments.
Toward Controlled Generation of Text
TLDR
A new neural generative model is proposed which combines variational auto-encoders and holistic attribute discriminators for effective imposition of semantic structures inGeneric generation and manipulation of text.
...
...