• Corpus ID: 17816853

Contextual LSTM (CLSTM) models for Large scale NLP tasks

  title={Contextual LSTM (CLSTM) models for Large scale NLP tasks},
  author={Shalini Ghosh and Oriol Vinyals and Brian Strope and Scott Roy and Tom Dean and Larry Heck},
Documents exhibit sequential structure at multiple levels of abstraction (e.g., sentences, paragraphs, sections). These abstractions constitute a natural hierarchy for representing the context in which to infer the meaning of words and larger fragments of text. In this paper, we present CLSTM (Contextual LSTM), an extension of the recurrent neural network LSTM (Long-Short Term Memory) model, where we incorporate contextual features (e.g., topics) into the model. We evaluate CLSTM on three… 
TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency
In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based language model designed to directly capture the global semantic meaning relating words in a document via latent topics.
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
This paper investigates the role of context in an LSTM LM, through ablation studies, and analyzes the increase in perplexity when prior context words are shuffled, replaced, or dropped to provide a better understanding of how neural LMs use their context.
Neural Contextual Conversation Learning with Labeled Question-Answering Pairs
An end-to-end approach to avoid problem in neural generative models that produces generic or safe responses in different contexts and the model with contextual attention outperforms others including the state-of-the-art seq2seq models on perplexity test.
Efficient Transfer Learning for Neural Network Language Models
It is shown that it is possible to construct a language model from a small, focused corpus by first training an LSTM language model on a large corpus and then retraining only the internal transition model parameters on the smaller corpus.
Exploring Context’s Diversity to Improve Neural Language Model
A new cross-entropy loss function is proposed which is used to calculate the cross-ENTropy loss of the softmax outputs for any two different given contexts, and its effectiveness on the benchmark dataset is shown.
TNT-NLG , System 2 : Data Repetition and Meaning Representation Manipulation to Improve Neural Generation
This paper presents “TNT-NLG” System 2, the second system submission in the E2E NLG challenge, which focuses on generating coherent natural language realizations from meaning representations (MRs) in the restaurant domain, and sees that simple modifications allow for increases in performance by providing the generator with a much larger sample of data for learning.
What comes next? Extractive summarization by next-sentence prediction
This work presents NEXTSUM, a novel approach to summarization based on a model that predicts the next sentence to include in the summary using not only the source article, but also the summary produced so far, and shows that such a model successfully captures summary-specific discourse moves, and leads to better content selection performance.
TNT-NLG , System 1 : Using a Statistical NLG to Massively Augment Crowd-Sourced Data for Neural Generation
This paper presents TNT-NLG System 1, the first system submission to the E2E NLG Challenge, where it is shown that natural language (NL) realizations from meaning representations (MRs) in the restaurant domain can be generated by massively expanding the training dataset.
What is this Article about? Extreme Summarization with Topic-aware Convolutional Neural Networks
A novel abstractive model is proposed which is conditioned on the article's topics and based entirely on convolutional neural networks, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans on the extreme summarization dataset.
About ? Extreme Summarization with Topic-Aware Convolutional Neural Networks
  • Shashi Narayan
  • Computer Science
  • 2019
This work introduces extreme summarization, a new single-document summarization task which aims at creating a short, one-sentence news summary answering the question “What is the article about?” and proposes a novel abstractive model which is conditioned on the article’s topics and based entirely on convolutional neural networks.


Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
The Tree-LSTM is introduced, a generalization of LSTMs to tree-structured network topologies that outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences and sentiment classification.
Long Short-Term Memory Over Tree Structures
This paper proposes to extend chain-structured long short-term memory to tree structures, in which a memory cell can reflect the history memories of multiple child cells or multiple descendant cells in a recursive process, and calls the model S-LSTM, which provides a principled way of considering long-distance interaction over hierarchies.
Context dependent recurrent neural network language model
This paper improves recurrent neural network language models performance by providing a contextual real-valued input vector in association with each word to convey contextual information about the sentence being modeled by performing Latent Dirichlet Allocation using a block of preceding text.
Hierarchical Recurrent Neural Network for Document Modeling
A novel hierarchical recurrent neural network language model (HRNNLM) for document modeling that integrates it as the sentence history information into the word level RNN to predict the word sequence with cross-sentence contextual information.
Sequence to Sequence Learning with Neural Networks
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Skip-Thought Vectors
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the
Distributed Representations of Sentences and Documents
Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
Document Context Language Models
A set of multi-level recurrent neural network language models, called Document-Context Language Models (DCLM), which incorporate contextual information both within and beyond the sentence, are presented and empirically evaluated.
One billion word benchmark for measuring progress in statistical language modeling
A new benchmark corpus to be used for measuring progress in statistical language modeling, with almost one billion words of training data, is proposed, which is useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques.
LSTM Neural Networks for Language Modeling
This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.