• Corpus ID: 7138078

Semi-supervised Sequence Learning

@inproceedings{Dai2015SemisupervisedSL,
  title={Semi-supervised Sequence Learning},
  author={Andrew M. Dai and Quoc V. Le},
  booktitle={NIPS},
  year={2015}
}
We present two approaches to use unlabeled data to improve Sequence Learning with recurrent networks. [] Key Method These two algorithms can be used as a "pretraining" algorithm for a later supervised sequence learning algorithm. In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models. In our experiments, we find that long short term memory recurrent networks after pretrained with the two approaches become more stable to…

Figures and Tables from this paper

Semi-Supervised Sequence Modeling with Cross-View Training
TLDR
Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data, is proposed and evaluated, achieving state-of-the-art results.
Unsupervised Pretraining for Sequence to Sequence Learning
TLDR
This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models by pretraining the weights of the encoder and decoder with the pretrained weights of two language models and then fine-tuned with labeled data.
Semi-Supervised Learning for Text Classification by Layer Partitioning
  • Alexander Hanbo Li, A. Sethy
  • Computer Science
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
TLDR
This work proposes to decompose a neural network M into two components F and U so that M = U ◦ F, which serves as a feature extractor that maps the input to high-level representation and adds systematical noise using dropout.
Multi-task Sequence to Sequence Learning
TLDR
The results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks, and reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context.
Self-training Improves Pre-training for Natural Language Understanding
TLDR
SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data to retrieve sentences from a bank of billions of unlabeled sentences crawled from the web, is introduced.
Unsupervised Sequence Classification using Sequential Output Statistics
TLDR
A stochastic primal-dual gradient method is developed that is less inclined to be stuck in trivial solutions and avoids the need for a strong generative model and gives drastically lower errors than other baseline methods.
Semi-supervised Text Classification with Temporal Ensembling
  • Rong Xiang, Shiqun Yin
  • Computer Science
    2021 International Conference on Computer Communication and Artificial Intelligence (CCAI)
  • 2021
TLDR
A semi-supervised model which combine Bi-GRU and temporal ensembling which gives a pseudo label for each example in unlabeled data into text classification is proposed and shown to provide improvements when compared to the state of the art methods.
Semi-supervised Variational Autoencoders for Sequence Classification
TLDR
A conditional Long Short-Term Memory network (conditional LSTM) is presented, which receives the conditional information all the time-steps and significantly improves the classification accuracy compared with pure-supervised classifier and achieves competitive performance against previous pre-training based methods.
A Novel Multi-Task Learning Framework for Semi-Supervised Semantic Parsing
TLDR
This work proposes a semi-supervised semantic parsing methods by exploiting unlabeled natural utterances in a novel multi-task learning framework and takes entity sequences as training targets to improve the representations of encoder and reduce entity-mistakes in prediction.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 55 REFERENCES
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Unsupervised Learning of Video Representations using LSTMs
TLDR
This work uses Long Short Term Memory networks to learn representations of video sequences and evaluates the representations by finetuning them for a supervised learning problem - human action recognition on the UCF-101 and HMDB-51 datasets.
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data
TLDR
This paper presents a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data, and algorithms for structural learning will be proposed, and computational issues will be investigated.
Skip-Thought Vectors
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the
A Neural Conversational Model
TLDR
A simple approach to conversational modeling which uses the recently proposed sequence to sequence framework, and is able to extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles.
Show and tell: A neural image caption generator
TLDR
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image.
LSTM: A Search Space Odyssey
TLDR
This paper presents the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling, and observes that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.
A Neural Probabilistic Language Model
TLDR
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.
Distributed Representations of Sentences and Documents
TLDR
Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
Beyond short snippets: Deep networks for video classification
TLDR
This work proposes and evaluates several deep neural network architectures to combine image information across a video over longer time periods than previously attempted, and proposes two methods capable of handling full length videos.
...
1
2
3
4
5
...