Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?

  title={Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?},
  author={Lyan Verwimp and Jerome R. Bellegarda},
Natural language processing (NLP) tasks tend to suffer from a paucity of suitably annotated training data, hence the recent success of transfer learning across a wide variety of them. The typical recipe involves: (i) training a deep, possibly bidirectional, neural network with an objective related to language modeling, for which training data is plentiful; and (ii) using the trained network to derive contextual representations that are far richer than standard linear word embeddings such as… 

Figures and Tables from this paper

DeepKAF: A Heterogeneous CBR & Deep Learning Approach for NLP Prototyping
DeepKAF is provided, a heterogeneous CBR framework - DeepKAF where the CBR paradigm with Deep Learning architectures to solve complicated Natural Language Processing (NLP) problems (eg. mixed language and grammatically incorrect text).
Dense Pixel-Labeling For Reverse-Transfer And Diagnostic Learning On Lung Ultrasound For Covid-19 And Pneumonia Detection
It is shown that dense labels help reduce false positive detection and the segmentation-based models perform better classification when using pretrained segmentation weights, with the dense-label pretrained U-Net performing the best.


Language Models are Unsupervised Multitask Learners
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Deep Contextualized Word Representations
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Universal Language Model Fine-tuning for Text Classification
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.
Larger-Context Language Modelling with Recurrent Neural Network
It is discovered that content words, including nouns, adjectives and verbs, benefit most from an increasing number of context sentences, which suggests that larger-context language model improves the unconditional language model by capturing the theme of a document better and more easily.
Adapting Pre-trained Word Embeddings For Use In Medical Coding
A method to add task specific information to pre-trained word embeddings to improve their utility and show that adding extra information is possible and beneficial for the task at hand.
Context dependent recurrent neural network language model
This paper improves recurrent neural network language models performance by providing a contextual real-valued input vector in association with each word to convey contextual information about the sentence being modeled by performing Latent Dirichlet Allocation using a block of preceding text.
LSTM Neural Networks for Language Modeling
This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.
GloVe: Global Vectors for Word Representation
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Relevance-based Word Embedding
Both query expansion experiments on four TREC collections and query classification experiments on the KDD Cup 2005 dataset suggest that the relevance-based word embedding models significantly outperform state-of-the-art proximity-based embedding model, such as word2vec and GloVe.