Feature-rich continuous language models for speech recognition

  title={Feature-rich continuous language models for speech recognition},
  author={Piotr Wojciech Mirowski and Sumit Chopra and Suhrid Balakrishnan and Srinivas Bangalore},
  journal={2010 IEEE Spoken Language Technology Workshop},
State-of-the-art probabilistic models of text such as n-grams require an exponential number of examples as the size of the context grows, a problem that is due to the discrete word representation. We propose to solve this problem by learning a continuous-valued and low-dimensional mapping of words, and base our predictions for the probabilities of the target word on non-linear dynamics of the latent space representation of the words in context window. We build on neural networks-based language… 

Figures and Tables from this paper

Dependency Recurrent Neural Language Models for Sentence Completion

This paper shows how it can improve the performance of the recurrent neural network (RNN) language model by incorporating the syntactic dependencies of a sentence, which have the effect of bringing relevant contexts closer to the word being predicted.

Language Models With Meta-information

The results reported in this thesis show that meta-information can be used to improve the effectiveness of language models at the cost of increasing training time, and a subsampling stochastic gradient descent algorithm has been proposed to accelerate the training of recurrent neural network language models.

Integrating meta-information into recurrent neural network language models

Time Series Modeling with Hidden Variables and Gradient-Based Algorithms

The hypothesis is that a principled inference of hidden variables is achievable in the energy-based framework, through gradient-based optimization to find the minimum-energy state sequence given observations, which enables higher-order nonlinearities than graphical models.

Improvised Theatre Alongside Artificial Intelligences

This study presents the first report of Artificial Improvisation, or improvisational theatre performed live, on-stage, alongside an artificial intelligence-based improvisational performer, and introduces Pyggy and A.L.Ex.



Hierarchical Probabilistic Neural Network Language Model

A hierarchical decomposition of the conditional probabilities that yields a speed-up of about 200 both during training and recognition, constrained by the prior knowledge extracted from the WordNet semantic hierarchy is introduced.

Hierarchical Distributed Representations for Statistical Language Modeling

This paper shows how to learn hierarchical, distributed representations of word contexts that maximize the predictive value of a statistical language model, and demonstrates consistent improvement over class-based bigram models.

A Neural Probabilistic Language Model

This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.


This paper describes a new approach that performs the estimation of the language model probabilities in a continuous space, allowing by these means smooth interpolation of unobserved n-grams.

Continuous Space Language Models for Statistical Machine Translation

This work proposes to use a new statistical language model that is based on a continuous representation of the words in the vocabulary, which achieves consistent improvements in the BLEU score on the development and test data.

Continuous space language modeling techniques

It is demonstrated that TMLMs provide significant improvements of over 16% relative and 10% relative in Character Error Rate for Mandarin speech recognition, over the trigram and NNLM models, respectively in a speech to speech translation task.

Improving a statistical language model through non-linear prediction

Three new graphical models for statistical language modelling

It is shown how real-valued distributed representations for words can be learned at the same time as learning a large set of stochastic binary hidden features that are used to predict the distributed representation of the next word from previous distributed representations.

A unified architecture for natural language processing: deep neural networks with multitask learning

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic

A Scalable Hierarchical Distributed Language Model

A fast hierarchical language model along with a simple feature-based algorithm for automatic construction of word trees from the data are introduced and it is shown that the resulting models can outperform non-hierarchical neural models as well as the best n-gram models.