• Corpus ID: 570528

A Linear Dynamical System Model for Text

  title={A Linear Dynamical System Model for Text},
  author={David Belanger and Sham M. Kakade},
  booktitle={International Conference on Machine Learning},
Low dimensional representations of words allow accurate NLP models to be trained on limited annotated data. While most representations ignore words' local context, a natural way to induce context-dependent representations is to perform inference in a probabilistic latent-variable sequence model. Given the recent success of continuous vector space word representations, we provide such an inference procedure for continuous states, where words' representations are given by the posterior mean of a… 

Figures and Tables from this paper

A Latent Variable Model Approach to PMI-based Word Embeddings

A new generative model is proposed, a dynamic version of the log-linear topic model of Mnih and Hinton (2007) to use the prior to compute closed form expressions for word statistics, and it is shown that latent word vectors are fairly uniformly dispersed in space.

RAND-WALK: A Latent Variable Model Approach to Word Embeddings

A new generative model is proposed, a dynamic version of the log-linear topic model of~\citet{mnih2007three}.

Random walks on discourse spaces: a new generative language model with applications to semantic word embeddings

This paper presents a loglinear generative model that models the generation of a text corpus as a random walk in a latent discourse space that links and provides theoretical support for several prior methods for finding embeddings, as well as provides interpretations for various linear algebraic structures in word embeddeddings obtained from nonlinear techniques.

Dependent Multinomial Models Made Easy: Stick-Breaking with the Polya-gamma Augmentation

This work uses a logistic stick-breaking representation and recent innovations in Polya-gamma augmentation to reformulate the multinomial distribution in terms of latent variables with jointly Gaussian likelihoods, enabling it to take advantage of a host of Bayesian inference techniques for Gaussian models with minimal overhead.

Bayes Filters and Recurrent Neural Networks

This work proposes a new class of models called Predictive State Recurrent Neural Networks, which combine the axiomatic probability theory of Bayes Filters with the rich functional forms and practical success of RNNs and shows PSRNNs outperform conventional RNN architectures on a range of datasets including both text and robotics data.

Efficient Methods for Prediction and Control in Partially Observable Environments

The proposed framework for constructing state estimators enjoys a number of theoretical and practical advantages over existing methods, and it is demonstrated its efficacy in a prediction setting, where the task is to predict future observations, as well as a control setting, which is to optimize a control policy via reinforcement learning.

Intelligible Language Modeling with Input Switched Affine Networks

A recurrent architecture composed of input-switched affine transformations, in other words an RNN without any nonlinearity and with one set of weights per input, which achieves near identical performance on language modeling of Wikipedia text.

Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition

A convolutional tensor decomposition mechanism to learn good word-sequence phrase dictionary in the learning phase and a deconvolution framework that is immune to the problem of varying sentence lengths in the decode phase.

Bag-of-Vector Embeddings of Dependency Graphs for Semantic Induction

This paper proposes efficient training and inference algorithms based on tensor factorisation for embedding arbitrary graphs in a bag-of-vector space and demonstrates the usefulness of this representation by training bag- of-vector embeddings of dependency graphs and evaluating them on unsupervised semantic induction for the Semantic Textual Similarity and Natural Language Inference tasks.

Learning dialogue dynamics with the method of moments

This work shows that dialogues may be modeled by SP-RFA, a class of graphical models efficiently learnable within the MoM and directly usable in planning algorithms (such as reinforcement learning).



Neural Probabilistic Language Models

This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, and incorporates this new language model into a state-of-the-art speech recognizer of conversational speech.

A Spectral Algorithm for Learning Class-Based n-gram Models of Natural Language

A new algorithm for clustering under the Brown et al. model, which relies on the use of canonical correlation analysis to derive a low-dimensional representation of words and a bottom-up hierarchical clustering over these representations, which is an order of magnitude more efficient.

LSTM Neural Networks for Language Modeling

This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.

Two Step CCA: A new spectral method for estimating vector models of words

This paper presents a new spectral method based on CCA to learn an eigenword dictionary and proves theoretically that this two-step procedure has lower sample complexity than the simple single step procedure.

Learning Longer Memory in Recurrent Neural Networks

This paper shows that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent, by using a slight structural modification of the simple recurrent neural network architecture.

Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space

An extension to the Skip-gram model that efficiently learns multiple embeddings per word type is presented, and its scalability is demonstrated by training with one machine on a corpus of nearly 1 billion tokens in less than 6 hours.

Multi-View Learning of Word Embeddings via CCA

Low Rank Multi-View Learning (LR-MVL) is extremely fast, gives guaranteed convergence to a global optimum, is theoretically elegant, and achieves state-of-the-art performance on named entity recognition (NER) and chunking problems.

Statistical Language Models Based on Neural Networks

Although these models are computationally more expensive than N -gram models, with the presented techniques it is possible to apply them to state-of-the-art systems efficiently and achieves the best published performance on well-known Penn Treebank setup.

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Word Representations: A Simple and General Method for Semi-Supervised Learning

This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.