Connectionist language modeling for large vocabulary continuous speech recognition

  title={Connectionist language modeling for large vocabulary continuous speech recognition},
  author={Holger Schwenk and Jean-Luc Gauvain},
  journal={2002 IEEE International Conference on Acoustics, Speech, and Signal Processing},
  • Holger Schwenk, J. Gauvain
  • Published 13 May 2002
  • Computer Science
  • 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
This paper describes ongoing work on a new approach for language modeling for large vocabulary continuous speech recognition. [] Key Method A neural network is used to learn the projection of the words onto a continuous space and to estimate the n-gram probabilities. The connectionist language model is being evaluated on the DARPA HUB5 conversational telephone speech recognition task and preliminary results show consistent improvements in both perplexity and word error rate.

Figures and Tables from this paper

Continuous space language models
Efficient training of large neural networks for language modeling
  • H. Schwenk
  • Computer Science
    2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541)
  • 2004
The described approach achieves significant word error reductions with respect to a carefully tuned 4-gram backoff language model in a state of the art conversational speech recognizer for the DARPA rich transcriptions evaluations.
Neural network language models for conversational speech recognition
The generalization behavior of the neural network LM for in-domain training corpora varying from 7M to over 21M words is analyzed and significant word error reductions were observed compared to a carefully tuned 4-gram backoff language model in a state of the art conversational speech recognizer for the NIST rich transcription evaluations.
Training Neural Network Language Models on Very Large Corpora
New algorithms to train a neural network language model on very large text corpora are presented, making possible the use of the approach in domains where several hundreds of millions words of texts are available.
This paper describes a new approach that performs the estimation of the language model probabilities in a continuous space, allowing by these means smooth interpolation of unobserved n-grams.
Language models for automatic speech recognition : construction and complexity control
Experiments on Finnish and English text corpora show that the proposed pruning method gives considerable improvements over the previous pruning algorithms for Kneser-Ney smoothed models and also is better than entropy pruned GoodTuring smoothed model.
Building continuous space language models for transcribing european languages
The recognition of French Broadcast News and English and Spanish parliament speeches is addressed, tasks for which less resources are available, and a neural network language model is applied that takes better advantage of the limited amount of training data.
Large Vocabulary SOUL Neural Network Language Models
A new training scheme is proposed for SOUL NNLMs that is based on separate training of the outof-shortlist part of the output layer, which enables using more data at each iteration of a neural network without any considerable slow-down in training and brings additional improvements in speech recognition performance.
Study on n-gram language models for topic and out-of-vocabulary words
This research investigated a class LM based on a latent semantic analysis (LSA) and proposed a new approach for a topic-dependent LM called topic dependent class (TDC) based n-gram, where the topic is decided in topic.
Long-Distance Continuous Space Language Modeling for Speech Recognition
A long distance continuous language model based on a latent semantic analysis (LSA) that represents each word with a continuous vector that keeps the word order and position in the sentences and uses tied-mixture HMM modeling (TM-HMM) to robustly estimate the LM parameters and word probabilities.


A Neural Probabilistic Language Model
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.
Fast decoding for indexation of broadcast data
A new decoder was implemented which transcribes broadcast data in few times real-time with only a slight increase in word error rate when compared to the best system, and experiments show that reasonable performance is still obtained with a 1.4xRT transcription system.
Neural Networks for Pattern Recognition
Language Model Adaptation
Basic theory is presented for maximum a-posteriori estimation, mixture based adaptation, and minimum discrimination information, and models to cope with long distance dependencies are also introduced.
AIV Assembly, Integration, Verification AOA Angle of Attack CAD Computer Aided Design CFD Computational Fluid Dynamics GLOW Gross Lift-Off Mass GNC Guidance Navigation and Control IR Infra-Red LEO
Neural Networks for Pattern Recognition
Automatically Tuned Linear Algebra Software (ATLAS)
Fast decoding for indexatio n of broadcast data
  • inProc. ICSLP, 2000.
  • 2000
Networks for Pattern Recognition
  • 1995