Learn More
Stochastic language models are widely used in spoken language understanding to recognize and interpret the speech signal: the speech samples are decoded into word transcriptions by means of acoustic and syntactic models and then interpreted according to a semantic model. Both for speech recognition and understanding, search algorithms use stochastic models(More)
—Most contemporary laboratory recognizers require too much memory to run, and are too slow for mass applications. One major cause of the problem is the large parameter space of their acoustic models. In this paper, we propose a new acoustic modeling methodology which we call subspace distribution clustering hidden Markov modeling (SDCHMM) with the aim at(More)
This paper presents a new approach for multi-band based automatic speech recognition (ASR). Recent work by Bourlard and Herman-sky suggests that multi-band ASR gives more accurate recognition , especially in noisy acoustic environments, by combining the likelihoods of different frequency bands. Here we evaluate this likelihood recombination (LC) approach to(More)
In this paper, we present a low latency real-time Broadcast News recognition system capable of transcribing live television news-casts with reasonable accuracy. We describe our recent modeling and efficiency improvements that yield a 22% word error rate on the Hub4e98 test set while running faster than real-time. These include the discriminative training of(More)
This paper describes the AT&T WATSON real-time speech recog-nizer, the product of several decades of research at AT&T. The rec-ognizer handles a wide range of vocabulary sizes and is based on continuous-density hidden Markov models for acoustic modeling and finite state networks for language modeling. The recognition network is optimized for efficient(More)
The design and implementation of the AT&T Communicator mixed-initiative spoken dialog system is described. The Communicator project, sponsored by DARPA and launched in 1999, is a multi-year multi-site project on advanced spoken dialog systems research. The main focus of this paper is on the issues related to the design of mixed-initiative systems. In(More)
Application specific acoustic models provide the best recognition accuracy, but they are expensive, because they require the transcription of tens or hundreds of hours of in-domain speech for training. Therefore, this paper focuses on the acoustic model estimation given limited in-domain transcribed speech data, and large amounts of (typically available)(More)