Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

@article{Conneau2017SupervisedLO,
  title={Supervised Learning of Universal Sentence Representations from Natural Language Inference Data},
  author={Alexis Conneau and Douwe Kiela and Holger Schwenk and Lo{\"i}c Barrault and Antoine Bordes},
  journal={ArXiv},
  year={2017},
  volume={abs/1705.02364}
}
Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features. Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so successful. Several attempts at learning unsupervised representations of sentences have not reached satisfactory enough performance to be widely adopted. In this paper, we show how universal sentence representations trained using the supervised data of the Stanford… 
Supervised Contextual Embeddings for Transfer Learning in Natural Language Processing Tasks
TLDR
This work focuses on extracting representations from multiple pre-trained supervised models, which enriches word embeddings with task and domain specific knowledge.
InferLite: Simple Universal Sentence Representations from Natural Language Inference Data
TLDR
A lightweight version of InferSent is proposed, called InferLite, that does not use any recurrent layers and operates on a collection of pre-trained word embeddings, and a semantic hashing layer is described that allows the model to learn generic binary codes for sentences.
DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
TLDR
Inspired by recent advances in deep metric learning (DML), this work carefully design a self-supervised objective for learning universal sentence embeddings that does not require labelled training data and closes the performance gap between unsupervised and supervised pretraining for universal sentence encoders.
Sentence embeddings in NLI with iterative refinement encoders
TLDR
This work proposes a hierarchy of bidirectional LSTM and max pooling layers that implements an iterative refinement strategy and yields state of the art results on the SciTail dataset as well as strong results for Stanford Natural Language Inference and Multi-Genre Natural language Inference.
Mining Discourse Markers for Unsupervised Sentence Representation Learning
TLDR
This work proposes a method to automatically discover sentence pairs with relevant discourse markers, and applies it to massive amounts of data, to use as supervision for learning transferable sentence embeddings.
Training Effective Neural Sentence Encoders from Automatically Mined Paraphrases
TLDR
A method for training effective language- specific sentence encoders without manually labeled data is proposed, to automatically construct a dataset of paraphrase pairs from sentence-aligned bilingual text corpora and used to tune a Transformer language model with an additional recurrent pooling layer.
Unsupervised Learning of Sentence Representations Using Sequence Consistency
TLDR
This work proposes ConsSent, a simple yet surprisingly powerful unsupervised method to learn such representations by enforcing consistency constraints on sequences of tokens by training sentence encoders to distinguish between consistent and inconsistent examples.
TRANSFER LEARNING IN NATURAL LANGUAGE PRO-
TLDR
This work focuses on extracting representations from multiple pre-trained supervised models, which enriches word embeddings with task and domain specific knowledge.
DisSent: Learning Sentence Representations from Explicit Discourse Relations
TLDR
It is demonstrated that the automatically curated corpus allows a bidirectional LSTM sentence encoder to yield high quality sentence embeddings and can serve as a supervised fine-tuning dataset for larger models such as BERT.
DisSent: Sentence Representation Learning from Explicit Discourse Relations
TLDR
It is demonstrated that the automatically curated corpus allows a bidirectional LSTM sentence encoder to yield high quality sentence embeddings and can serve as a supervised fine-tuning dataset for larger models such as BERT.
...
...

References

SHOWING 1-10 OF 47 REFERENCES
A large annotated corpus for learning natural language inference
TLDR
The Stanford Natural Language Inference corpus is introduced, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning, which allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.
Learning Distributed Representations of Sentences from Unlabelled Data
TLDR
A systematic comparison of models that learn distributed phrase or sentence representations from unlabelled data finds that the optimal approach depends critically on the intended application.
Natural Language Processing (Almost) from Scratch
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity
Towards Universal Paraphrastic Sentence Embeddings
TLDR
This work considers the problem of learning general-purpose, paraphrastic sentence embeddings based on supervision from the Paraphrase Database, and compares six compositional architectures, finding that the most complex architectures, such as long short-term memory (LSTM) recurrent neural networks, perform best on the in-domain data.
Skip-Thought Vectors
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the
A unified architecture for natural language processing: deep neural networks with multitask learning
We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic
A Simple but Tough-to-Beat Baseline for Sentence Embeddings
Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention
TLDR
A sentence encoding-based model for recognizing text entailment that utilized the sentence's first-stage representation to attend words appeared in itself, which is called "Inner-Attention" in this paper.
Enriching Word Vectors with Subword Information
TLDR
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.
Distributed Representations of Sentences and Documents
TLDR
Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
...
...