Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport

@article{Swanson2020RationalizingTM,
  title={Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport},
  author={Kyle Swanson and L. Yu and Tao Lei},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.13111}
}
Selecting input features of top relevance has become a popular method for building self-explaining models. In this work, we extend this selective rationalization approach to text matching, where the goal is to jointly select and align text pieces, such as tokens or sentences, as a justification for the downstream prediction. Our approach employs optimal transport (OT) to find a minimal cost alignment between the inputs. However, directly applying OT often produces dense and therefore… 
SPECTRA: Sparse Structured Text Rationalization
TLDR
This paper presents a unified framework for deterministic extraction of structured explanations via constrained inference on a factor graph, forming a differentiable layer and provides a comparative study of stochastic and deterministic methods for rationale extraction for classification and natural language inference tasks.
Alignment Rationale for Natural Language Inference
TLDR
AREC, a post-hoc approach to generate alignment rationale explanations for co-attention based models in NLI based on feature selection, which keeps few but sufficient alignments while maintaining the same prediction of the target model.
QUASER: Question Answering with Scalable Extractive Rationalization
TLDR
It is shown that unsupervised generative models to extract dual-purpose rationales can produce more meaningful rationales, that are less influenced by dataset artifacts, and as a result, also achieve the state-of-the-art on rationale extraction metrics on four datasets from the ERASER benchmark.
Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning
TLDR
This work explicitly describes the sentence distance as the weighted sum of contextualized token distances on the basis of a transportation problem, and presents the optimal transport-based distance measure, named RCMD; it identifies and leverages semantically-aligned token pairs and enhances the quality of sentence similarity and their interpretation.
OTExtSum: Extractive Text Summarisation with Optimal Transport
TLDR
This paper proposes a novel non-learning-based method for the first time formulating text summarisation as an Optimal Transport (OT) problem, namelyOptimal Transport Extractive Summariser (OTExtSum), which outperforms the state-of-the-art non- learning-based methods and several recent learning- based methods in terms of the ROUGE metric.
FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation
TLDR
FiD-Ex is developed, which introduces sentence markers to eliminate explanation fabrication by encouraging extractive generation, and uses the fusion-in-decoder architecture to handle long input contexts, and intermediate fine-tuning on re-structured open domain QA datasets to improve few-shot performance.
Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity
TLDR
This work develops multi-vector representations where vectors correspond to sentence-level aspects of documents, and presents two methods for aspect matching: a fast method that only matches single aspects and a method that makes sparse multiple matches with an Optimal Transport mechanism that computes an Earth Mover’s Distance between aspects.
Can Rationalization Improve Robustness?
TLDR
This study systematically generates various types of ‘AddText’ attacks for both token and sentence-level rationalization tasks and performs an extensive empirical evaluation of state-of-the-art rationale models across five different tasks, revealing that the rationale models promise to improve robustness over AddText attacks while they struggle in certain scenarios.
Word Alignment by Fine-tuning Embeddings on Parallel Corpora
TLDR
Methods to marry pre-trained contextualized word embeddings derived from multilingually trained language models but fine-tuning them on parallel text with objectives designed to improve alignment quality are examined, and methods to effectively extract alignments from these fine- tuned models are proposed.
Translation-Based Implicit Annotation Projection for Zero-Shot Cross-Lingual Event Argument Extraction
TLDR
A translation- based method to implicitly project annotations from the source language to the target language with the use of translation-based parallel corpora is investigated, which is more cost effective than previous works on zero-shot cross-lingual EAE.
...
...

References

SHOWING 1-10 OF 66 REFERENCES
A Regularized Framework for Sparse and Structured Neural Attention
TLDR
This paper proposes a new framework for sparse and structured attention, building upon a smoothed max operator, and shows that the gradient of this operator defines a mapping from real values to probabilities, suitable as an attention mechanism.
Gromov-Wasserstein Alignment of Word Embedding Spaces
TLDR
This paper casts the correspondence problem directly as an optimal transport (OT) problem, building on the idea that word embeddings arise from metric recovery algorithms and exploits the Gromov-Wasserstein distance that measures how similarities between pairs of words relate across languages.
Rationalizing Neural Predictions
TLDR
The approach combines two modular components, generator and encoder, which are trained to operate well together and specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction.
Interpretable Neural Predictions with Differentiable Binary Variables
TLDR
This work proposes a latent model that mixes discrete and continuous behaviour allowing at the same time for binary selections and gradient-based training without REINFORCE, and can tractably compute the expected value of penalties such as L0, which allows it to directly optimise the model towards a pre-specified text selection rate.
Structured Attention Networks
TLDR
This work shows that structured attention networks are simple extensions of the basic attention procedure, and that they allow for extending attention beyond the standard soft-selection approach, such as attending to partial segmentations or to subtrees.
ERASER: A Benchmark to Evaluate Rationalized NLP Models
TLDR
This work proposes the Evaluating Rationales And Simple English Reasoning (ERASER) a benchmark to advance research on interpretable models in NLP, and proposes several metrics that aim to capture how well the rationales provided by models align with human rationales, and also how faithful these rationales are.
Reasoning about Entailment with Neural Attention
TLDR
This paper proposes a neural model that reads two sentences to determine entailment using long short-term memory units and extends this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases, and presents a qualitative analysis of attention weights produced by this model.
Generating Token-Level Explanations for Natural Language Inference
TLDR
It is shown that it is possible to generate token-level explanations for NLI without the need for training data explicitly annotated for this purpose, using a simple LSTM architecture and evaluating both LIME and Anchor explanations for this task.
Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation
TLDR
This work proposes a new model with a mechanism called Self-Adaptive Control of Temperature (SACT) to control the softness of attention by means of an attention temperature and demonstrates that the model outperforms the baseline models.
Rationale-Augmented Convolutional Neural Networks for Text Classification
TLDR
A sentence-level convolutional model is proposed that estimates the probability that a given sentence is a rationale, and the contribution of each sentence to the aggregate document representation in proportion to these estimates.
...
...