Learn More
This paper introduces a new neural network language model (NNLM) based on word clustering to structure the output vocabulary: Structured Output Layer NNLM. This model is able to handle vocabularies of arbitrary size, hence dispensing with the design of short-lists that are commonly used in NNLMs. Several softmax layers replace the standard output layer in(More)
This paper extends a novel neural network language model (NNLM) which relies on word clustering to structure the output vocabulary: Structured OUtput Layer (SOUL) NNLM. This model is able to handle arbitrarily-sized vocabularies, hence dispensing with the need for shortlists that are commonly used in NNLMs. Several softmax layers replace the standard output(More)
Research on language modeling for speech recognition has increasingly focused on the application of neural networks. Two competing concepts have been developed: On the one hand, feedforward neural networks representing an n-gram approach, on the other hand recurrent neural networks that may learn context dependencies spanning more than a fixed number of(More)
Neural Network language models (NNLMs) have recently become an important complement to conventional n-gram language models (LMs) in speech-to-text systems. However, little is known about the behavior of NNLMs. The analysis presented in this paper aims to understand which types of events are better modeled by NNLMs as compared to n-gram LMs, in what cases(More)
This paper describes the speech-to-text systems used to provide automatic transcriptions used in the Quaero 2010 evaluation of Machine Translation from speech. Quaero (www.quaero.org) is a large research and industrial innovation program focusing on technologies for automatic analysis and classification of multimedia and multilingual documents. The ASR(More)
This paper presents continuation of research on Structured OUtput Layer Neural Network language models (SOUL NNLM) for automatic speech recognition. As SOUL NNLMs allow estimating probabilities for all in-vocabulary words and not only for those pertaining to a limited shortlist, we investigate its performance on a large-vocabulary task. Significant(More)
This paper presents the results of the first Maurdor evaluation campaign. This campaign aims at evaluating the complete chain of scanned document image processing. It has a modular structure that includes page segmentation and zone classification, identification of writing type and language, optical character recognition and revealing logical structure of a(More)
This paper reports on a speech-to-text (STT) transcription system for Hungarian broadcast audio developed for the 2012 Quaero evaluations. For this evaluation, no manually transcribed audio data were provided for model training, however a small amount of development data were provided to assess system performance. As a consequence, the acoustic models were(More)
This paper describes recent advances at LIMSI in Mandarin Chinese speech-to-text transcription. A number of novel approaches were introduced in the different system components. The acoustic models are trained on over 1600 hours of audio data from a range of sources, and include pitch and MLP features. N-gram and neural network language models are trained on(More)