Learn More
We present an overview of Candide, a system for automatic translation of French text to English text. Candide uses methods of information theory and statistics to develop a probability model of the translation process. This model, which is made to accord as closely as possible with a large body of French and English sentence pairs, is then used to generate(More)
We present a maximum entropy language model that incorporates both syntax and semantics via a dependency grammar. Such a grammar expresses the relations between words by a directed graph. Because the edges of this graph may connect words that are arbitrarily far apart in a sentence, this technique can incorporate the predictive power of words that lie(More)
In this paper we define two alternatives to the familiar perplexity statistic (hereafter lexical perplexity), which is widely applied both as a measure-of-goodness and as an objective function for training language models. These alternatives, respectively acoustic per-plexity and the synthetic acoustic word error rate, fuse information from both the(More)
We report the results of investigations in acoustic modeling, language modeling and decoding techniques, for DARPA Communicator, a speaker-independent, telephone-based dialog system. By a combination of methods, including enlarging the acoustic model, augmenting the recognizer vocabulary, conditioning the language model upon dialog state, and applying a(More)
We describe an implementation of a simple probabilistic link grammar. This probabilistic language model extends trigrams by allowing a word to be predicted not only from the two immediately preceeding words, but potentially from any preceeding pair of adjacent words that lie within the same sentence. In this way, the trigram model can skip over less(More)
This paper describes a robust, accurate, efficient, low-resource, medium-vocabulary, grammar-based speech recognition system using Hidden Markov Models for mobile applications. Among the issues and techniques we explore are improving robustness and efficiency of the front-end, using multiple microphones for removing extraneous signals from speech via a new(More)
We describe techniques for enhancing the accuracy, efficiency and features of a low-resource, medium-vocabulary, grammar-based speech recognition system. Among the issues and techniques we explore are front-end speech / silence detection to reduce computational workload, the use of the Bayesian information criterion (BIC) to build smaller and better(More)