Dynamic Meta-Embeddings for Improved Sentence Representations

  title={Dynamic Meta-Embeddings for Improved Sentence Representations},
  author={Douwe Kiela and Changhan Wang and Kyunghyun Cho},
While one of the first steps in many NLP systems is selecting what pre-trained word embeddings to use, we argue that such a step is better left for neural networks to figure out by themselves. [] Key Method To that end, we introduce dynamic meta-embeddings, a simple yet effective method for the supervised learning of embedding ensembles, which leads to state-of-the-art performance within the same model class on a variety of tasks. We subsequently show how the technique can be used to shed new light on the…

Figures and Tables from this paper

A Survey on Word Meta-Embedding Learning

This paper classifies ME learning methods according to multiple factors such as whether they operate on static or contextualised embeddings, trained in an unsupervised manner or fine-tuned for a particular task/domain.

Dynamic Task-Specific Factors for Meta-Embedding

The dynamic task-specific factors into meta-embedding (DTFME) are introduced, which are utilized to calculate appropriate weights of different embedding sets without increasing complexity.

On the Choice of Auxiliary Languages for Improved Sequence Tagging

It is shown that attention-based meta-embeddings can effectively combine pre-trained embeddings from different languages for sequence tagging and set new state-of-the-art results for part- of-speech tagging in five languages.

Learning Efficient Task-Specific Meta-Embeddings with Word Prisms

This work introduces word prisms: a simple and efficient meta-embedding method that learns to combine source embeddings according to the task at hand, which allows them to be very efficient at inference time.

Denoising Word Embeddings by Averaging in a Shared Space

A method of fusing word embeddings that were trained on the same corpus but with different initializations is considered, which demonstrates consistent improvements over the raw models as well as their simplistic average, on a range of tasks.

Meta-Embeddings for Natural Language Inference and Semantic Similarity tasks

Meta Embedding derived from few State-of-the-Art (SOTA) models are proposed to efficiently tackle mainstream NLP tasks like classification, semantic relatedness, and text similarity by showing us that meta-embeddings can be used for several NLP task by harnessing the power of several individual representations.

Meta-Embeddings Based On Self-Attention

A new meta-embedding model based on the self-attention mechanism, namely the Duo, which achieves state-of-the-art accuracy in text classification tasks such as 20NG and is the first machine translation modelbased on more than one word- embedding.

Unsupervised Attention-based Sentence-Level Meta-Embeddings from Contextualised Language Models

A sentence-level meta-embedding learning method that takes independently trained contextualised word embedding models and learns a sentence embedding that preserves the complementary strengths of the input source NLMs, which is unsupervised and is not tied to a particular downstream task.

MoRTy: Unsupervised Learning of Task-specialized Word Embeddings by Autoencoding

A simple yet effective, self-supervised post-processing method that constructs task-specialized word representations by picking from a menu of reconstructing transformations to yield improved end-task performance (MORTY).

Meta-Embedding as Auxiliary Task Regularization

This work proposes to reconstruct an ensemble of word embeddings as an auxiliary task that regularises a main task while both tasks share the learned meta-embedding layer, and shows the best performance in 4 out of 6 of word similarity datasets when using a cosine reconstruction loss and Brier's word similarity loss.



Learning Word Meta-Embeddings by Using Ensembles of Embedding Sets

This paper proposes an ensemble approach of combining different public embedding sets with the aim of learning meta-embeddings, and shows better performance of meta- embeddings on word similarity and analogy tasks and on part-of-speech tagging.

Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model

A general architecture to learn the word and topic embeddings efficiently is presented, which is an extension to the Skip-Gram model and can model the interaction between words and topics simultaneously.

A Survey of Word Embeddings Evaluation Methods

An extensive overview of the field of word embeddings evaluation is presented, highlighting main problems and proposing a typology of approaches to evaluation, summarizing 16 intrinsic methods and 12 extrinsic methods.

K-Embeddings: Learning Conceptual Embeddings for Words using Context

A technique for adding contextual distinctions to word embeddings by extending the usual embedding process — into two phases that is iterative, scalable, and can be combined with other methods in achieving still more expressive representations.

Improving Distributional Similarity with Lessons Learned from Word Embeddings

It is revealed that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves, and these modifications can be transferred to traditional distributional models, yielding similar gains.

context2vec: Learning Generic Context Embedding with Bidirectional LSTM

This work presents a neural model for efficiently learning a generic context embedding function from large corpora, using bidirectional LSTM, and suggests they could be useful in a wide variety of NLP tasks.

Frustratingly Easy Meta-Embedding – Computing Meta-Embeddings by Averaging Source Word Embeddings

This paper shows that the arithmetic mean of two distinct word embedding sets yields a performant meta-embedding that is comparable or better than more complex meta- embedding learning methods.

An Exploration of Word Embedding Initialization in Deep-Learning Tasks

This work examines various random and pretrained initialization methods for embeddings used in deep networks and their effect on the performance on four NLP tasks with both recurrent and convolutional architectures and confirms that pretrained embeddins are a little better than random initialization, especially considering the speed of learning.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

SensEmbed: Learning Sense Embeddings for Word and Relational Similarity

This work proposes a multifaceted approach that transforms word embeddings to the sense level and leverages knowledge from a large semantic network for effective semantic similarity measurement.