# A Linear Dynamical System Model for Text

@inproceedings{Belanger2015ALD, title={A Linear Dynamical System Model for Text}, author={David Belanger and Sham M. Kakade}, booktitle={International Conference on Machine Learning}, year={2015} }

Low dimensional representations of words allow accurate NLP models to be trained on limited annotated data. While most representations ignore words' local context, a natural way to induce context-dependent representations is to perform inference in a probabilistic latent-variable sequence model. Given the recent success of continuous vector space word representations, we provide such an inference procedure for continuous states, where words' representations are given by the posterior mean of a…

## 26 Citations

### A Latent Variable Model Approach to PMI-based Word Embeddings

- Computer ScienceTACL
- 2016

A new generative model is proposed, a dynamic version of the log-linear topic model of Mnih and Hinton (2007) to use the prior to compute closed form expressions for word statistics, and it is shown that latent word vectors are fairly uniformly dispersed in space.

### RAND-WALK: A Latent Variable Model Approach to Word Embeddings

- Computer Science
- 2015

A new generative model is proposed, a dynamic version of the log-linear topic model of~\citet{mnih2007three}.

### Random walks on discourse spaces: a new generative language model with applications to semantic word embeddings

- Computer Science
- 2015

This paper presents a loglinear generative model that models the generation of a text corpus as a random walk in a latent discourse space that links and provides theoretical support for several prior methods for finding embeddings, as well as provides interpretations for various linear algebraic structures in word embeddeddings obtained from nonlinear techniques.

### Dependent Multinomial Models Made Easy: Stick-Breaking with the Polya-gamma Augmentation

- Computer ScienceNIPS
- 2015

This work uses a logistic stick-breaking representation and recent innovations in Polya-gamma augmentation to reformulate the multinomial distribution in terms of latent variables with jointly Gaussian likelihoods, enabling it to take advantage of a host of Bayesian inference techniques for Gaussian models with minimal overhead.

### Bayes Filters and Recurrent Neural Networks

- Computer Science
- 2017

This work proposes a new class of models called Predictive State Recurrent Neural Networks, which combine the axiomatic probability theory of Bayes Filters with the rich functional forms and practical success of RNNs and shows PSRNNs outperform conventional RNN architectures on a range of datasets including both text and robotics data.

### Efficient Methods for Prediction and Control in Partially Observable Environments

- Computer Science
- 2018

The proposed framework for constructing state estimators enjoys a number of theoretical and practical advantages over existing methods, and it is demonstrated its efficacy in a prediction setting, where the task is to predict future observations, as well as a control setting, which is to optimize a control policy via reinforcement learning.

### Intelligible Language Modeling with Input Switched Affine Networks

- Computer ScienceArXiv
- 2016

A recurrent architecture composed of input-switched affine transformations, in other words an RNN without any nonlinearity and with one set of weights per input, which achieves near identical performance on language modeling of Wikipedia text.

### Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition

- Computer ScienceArXiv
- 2016

A convolutional tensor decomposition mechanism to learn good word-sequence phrase dictionary in the learning phase and a deconvolution framework that is immune to the problem of varying sentence lengths in the decode phase.

### Bag-of-Vector Embeddings of Dependency Graphs for Semantic Induction

- Computer ScienceArXiv
- 2017

This paper proposes efficient training and inference algorithms based on tensor factorisation for embedding arbitrary graphs in a bag-of-vector space and demonstrates the usefulness of this representation by training bag- of-vector embeddings of dependency graphs and evaluating them on unsupervised semantic induction for the Semantic Textual Similarity and Natural Language Inference tasks.

### Learning dialogue dynamics with the method of moments

- Computer Science2016 IEEE Spoken Language Technology Workshop (SLT)
- 2016

This work shows that dialogues may be modeled by SP-RFA, a class of graphical models efficiently learnable within the MoM and directly usable in planning algorithms (such as reinforcement learning).

## References

SHOWING 1-10 OF 44 REFERENCES

### Neural Probabilistic Language Models

- Computer Science
- 2006

This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, and incorporates this new language model into a state-of-the-art speech recognizer of conversational speech.

### A Spectral Algorithm for Learning Class-Based n-gram Models of Natural Language

- Computer ScienceUAI
- 2014

A new algorithm for clustering under the Brown et al. model, which relies on the use of canonical correlation analysis to derive a low-dimensional representation of words and a bottom-up hierarchical clustering over these representations, which is an order of magnitude more efficient.

### LSTM Neural Networks for Language Modeling

- Computer ScienceINTERSPEECH
- 2012

This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.

### Two Step CCA: A new spectral method for estimating vector models of words

- Computer ScienceICML 2012
- 2012

This paper presents a new spectral method based on CCA to learn an eigenword dictionary and proves theoretically that this two-step procedure has lower sample complexity than the simple single step procedure.

### Learning Longer Memory in Recurrent Neural Networks

- Computer ScienceICLR
- 2015

This paper shows that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent, by using a slight structural modification of the simple recurrent neural network architecture.

### Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space

- Computer ScienceEMNLP
- 2014

An extension to the Skip-gram model that efficiently learns multiple embeddings per word type is presented, and its scalability is demonstrated by training with one machine on a corpus of nearly 1 billion tokens in less than 6 hours.

### Multi-View Learning of Word Embeddings via CCA

- Computer ScienceNIPS
- 2011

Low Rank Multi-View Learning (LR-MVL) is extremely fast, gives guaranteed convergence to a global optimum, is theoretically elegant, and achieves state-of-the-art performance on named entity recognition (NER) and chunking problems.

### Statistical Language Models Based on Neural Networks

- Computer Science
- 2012

Although these models are computationally more expensive than N -gram models, with the presented techniques it is possible to apply them to state-of-the-art systems efficiently and achieves the best published performance on well-known Penn Treebank setup.

### Sequence to Sequence Learning with Neural Networks

- Computer ScienceNIPS
- 2014

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

### Word Representations: A Simple and General Method for Semi-Supervised Learning

- Computer ScienceACL
- 2010

This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.