Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?
@article{Verwimp2019ReverseTL, title={Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?}, author={Lyan Verwimp and Jerome R. Bellegarda}, journal={ArXiv}, year={2019}, volume={abs/1909.04130} }
Natural language processing (NLP) tasks tend to suffer from a paucity of suitably annotated training data, hence the recent success of transfer learning across a wide variety of them. The typical recipe involves: (i) training a deep, possibly bidirectional, neural network with an objective related to language modeling, for which training data is plentiful; and (ii) using the trained network to derive contextual representations that are far richer than standard linear word embeddings such as…
2 Citations
DeepKAF: A Heterogeneous CBR & Deep Learning Approach for NLP Prototyping
- Computer Science2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)
- 2020
DeepKAF is provided, a heterogeneous CBR framework - DeepKAF where the CBR paradigm with Deep Learning architectures to solve complicated Natural Language Processing (NLP) problems (eg. mixed language and grammatically incorrect text).
Dense Pixel-Labeling For Reverse-Transfer And Diagnostic Learning On Lung Ultrasound For Covid-19 And Pneumonia Detection
- Computer Science2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI)
- 2021
It is shown that dense labels help reduce false positive detection and the segmentation-based models perform better classification when using pretrained segmentation weights, with the dense-label pretrained U-Net performing the best.
References
SHOWING 1-10 OF 33 REFERENCES
Language Models are Unsupervised Multitask Learners
- Computer Science
- 2019
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Deep Contextualized Word Representations
- Computer ScienceNAACL
- 2018
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Computer ScienceNAACL
- 2019
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Universal Language Model Fine-tuning for Text Classification
- Computer ScienceACL
- 2018
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.
Larger-Context Language Modelling with Recurrent Neural Network
- Computer ScienceACL
- 2016
It is discovered that content words, including nouns, adjectives and verbs, benefit most from an increasing number of context sentences, which suggests that larger-context language model improves the unconditional language model by capturing the theme of a document better and more easily.
Adapting Pre-trained Word Embeddings For Use In Medical Coding
- Computer ScienceBioNLP
- 2017
A method to add task specific information to pre-trained word embeddings to improve their utility and show that adding extra information is possible and beneficial for the task at hand.
Context dependent recurrent neural network language model
- Computer Science2012 IEEE Spoken Language Technology Workshop (SLT)
- 2012
This paper improves recurrent neural network language models performance by providing a contextual real-valued input vector in association with each word to convey contextual information about the sentence being modeled by performing Latent Dirichlet Allocation using a block of preceding text.
LSTM Neural Networks for Language Modeling
- Computer ScienceINTERSPEECH
- 2012
This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.
GloVe: Global Vectors for Word Representation
- Computer ScienceEMNLP
- 2014
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Relevance-based Word Embedding
- Computer ScienceSIGIR
- 2017
Both query expansion experiments on four TREC collections and query classification experiments on the KDD Cup 2005 dataset suggest that the relevance-based word embedding models significantly outperform state-of-the-art proximity-based embedding model, such as word2vec and GloVe.