Contextual LSTM (CLSTM) models for Large scale NLP tasks
@article{Ghosh2016ContextualL, title={Contextual LSTM (CLSTM) models for Large scale NLP tasks}, author={Shalini Ghosh and Oriol Vinyals and Brian Strope and Scott Roy and Tom Dean and Larry Heck}, journal={ArXiv}, year={2016}, volume={abs/1602.06291} }
Documents exhibit sequential structure at multiple levels of abstraction (e.g., sentences, paragraphs, sections). These abstractions constitute a natural hierarchy for representing the context in which to infer the meaning of words and larger fragments of text. In this paper, we present CLSTM (Contextual LSTM), an extension of the recurrent neural network LSTM (Long-Short Term Memory) model, where we incorporate contextual features (e.g., topics) into the model. We evaluate CLSTM on three…
Figures and Tables from this paper
173 Citations
TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency
- Computer ScienceICLR
- 2017
In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based language model designed to directly capture the global semantic meaning relating words in a document via latent topics.…
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
- Computer ScienceACL
- 2018
This paper investigates the role of context in an LSTM LM, through ablation studies, and analyzes the increase in perplexity when prior context words are shuffled, replaced, or dropped to provide a better understanding of how neural LMs use their context.
Neural Contextual Conversation Learning with Labeled Question-Answering Pairs
- Computer ScienceArXiv
- 2016
An end-to-end approach to avoid problem in neural generative models that produces generic or safe responses in different contexts and the model with contextual attention outperforms others including the state-of-the-art seq2seq models on perplexity test.
Efficient Transfer Learning for Neural Network Language Models
- Computer Science2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)
- 2018
It is shown that it is possible to construct a language model from a small, focused corpus by first training an LSTM language model on a large corpus and then retraining only the internal transition model parameters on the smaller corpus.
Exploring Context’s Diversity to Improve Neural Language Model
- Computer Science2019 International Conference on Asian Language Processing (IALP)
- 2019
A new cross-entropy loss function is proposed which is used to calculate the cross-ENTropy loss of the softmax outputs for any two different given contexts, and its effectiveness on the benchmark dataset is shown.
TNT-NLG , System 2 : Data Repetition and Meaning Representation Manipulation to Improve Neural Generation
- Computer Science
- 2018
This paper presents “TNT-NLG” System 2, the second system submission in the E2E NLG challenge, which focuses on generating coherent natural language realizations from meaning representations (MRs) in the restaurant domain, and sees that simple modifications allow for increases in performance by providing the generator with a much larger sample of data for learning.
What comes next? Extractive summarization by next-sentence prediction
- Computer ScienceArXiv
- 2019
This work presents NEXTSUM, a novel approach to summarization based on a model that predicts the next sentence to include in the summary using not only the source article, but also the summary produced so far, and shows that such a model successfully captures summary-specific discourse moves, and leads to better content selection performance.
TNT-NLG , System 1 : Using a Statistical NLG to Massively Augment Crowd-Sourced Data for Neural Generation
- Computer Science
- 2018
This paper presents TNT-NLG System 1, the first system submission to the E2E NLG Challenge, where it is shown that natural language (NL) realizations from meaning representations (MRs) in the restaurant domain can be generated by massively expanding the training dataset.
What is this Article about? Extreme Summarization with Topic-aware Convolutional Neural Networks
- Computer ScienceJ. Artif. Intell. Res.
- 2019
A novel abstractive model is proposed which is conditioned on the article's topics and based entirely on convolutional neural networks, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans on the extreme summarization dataset.
About ? Extreme Summarization with Topic-Aware Convolutional Neural Networks
- Computer Science
- 2019
This work introduces extreme summarization, a new single-document summarization task which aims at creating a short, one-sentence news summary answering the question “What is the article about?” and proposes a novel abstractive model which is conditioned on the article’s topics and based entirely on convolutional neural networks.
References
SHOWING 1-10 OF 62 REFERENCES
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
- Computer ScienceACL
- 2015
The Tree-LSTM is introduced, a generalization of LSTMs to tree-structured network topologies that outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences and sentiment classification.
Long Short-Term Memory Over Tree Structures
- Computer ScienceArXiv
- 2015
This paper proposes to extend chain-structured long short-term memory to tree structures, in which a memory cell can reflect the history memories of multiple child cells or multiple descendant cells in a recursive process, and calls the model S-LSTM, which provides a principled way of considering long-distance interaction over hierarchies.
Context dependent recurrent neural network language model
- Computer Science2012 IEEE Spoken Language Technology Workshop (SLT)
- 2012
This paper improves recurrent neural network language models performance by providing a contextual real-valued input vector in association with each word to convey contextual information about the sentence being modeled by performing Latent Dirichlet Allocation using a block of preceding text.
Hierarchical Recurrent Neural Network for Document Modeling
- Computer ScienceEMNLP
- 2015
A novel hierarchical recurrent neural network language model (HRNNLM) for document modeling that integrates it as the sentence history information into the word level RNN to predict the word sequence with cross-sentence contextual information.
Sequence to Sequence Learning with Neural Networks
- Computer ScienceNIPS
- 2014
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Skip-Thought Vectors
- Computer ScienceNIPS
- 2015
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the…
Distributed Representations of Sentences and Documents
- Computer ScienceICML
- 2014
Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
Document Context Language Models
- Computer ScienceICLR 2015
- 2015
A set of multi-level recurrent neural network language models, called Document-Context Language Models (DCLM), which incorporate contextual information both within and beyond the sentence, are presented and empirically evaluated.
One billion word benchmark for measuring progress in statistical language modeling
- Computer ScienceINTERSPEECH
- 2014
A new benchmark corpus to be used for measuring progress in statistical language modeling, with almost one billion words of training data, is proposed, which is useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques.
LSTM Neural Networks for Language Modeling
- Computer ScienceINTERSPEECH
- 2012
This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.