Mitch Weintraub

Learn More
We present a paradigm for the automatic assessment of pronunciation quality by machine. In this scoring paradigm, both native and nonnative speech data is collected, and a database of human-expert ratings is created to enable the development of a variety of machine scores. We rst discuss issues related to the design of speech databases, and the reliability(More)
We show that the standard hypothesis scoring paradigm used in maximum-likelihood-based speech recognition systems is not optimal with regard to minimizing the word error rate, the commonly used performance metric in speech recognition. This can lead to sub-optimal performance, especially in high-error-rate environments where word error and sentence error(More)
SRI International is currently involved in the development of a new generation of software systems for automatic scoring of pronunciation as part of the Voice Interactive Language Training System (VILTS) project. This paper describes the goals of the VILTS system, the speech corpus, and the algorithm development. The automatic grading system uses SRI’s(More)
SRI is developing a system that uses real time speech recognition to diag nose, evaluate and provide training in spoken English. The paper first describes the methods and results of a study of the feasibility of automati cally grading the performance of Japanese students when reading English aloud. Utterances recorded from Japanese speakers were(More)
We study a nonlinear discriminant analysis (NLDA) technique that extracts a speaker-discriminant feature set. Our approach is to train a multilayer perceptron (MLP) to maximize the separation between speakers by nonlinearly projecting a large set of acoustic features (e.g., several frames) to a lower-dimensional feature set. The extracted features are(More)
This paper proposes a probabilistic framework to de ne and evaluate con dence measures for word recognition. We describe a novel method to combine di erent knowledge sources and estimate the con dence in a word hypothesis, via a neural network. We also propose a measure of the joint performance of the recognition and con dence systems. The de nitions and(More)
Statistics of frame-level pitch have recently been used in speaker recognition systems with good results [1, 2, 3]. Although they convey useful long-term information about a speaker’s distribution of f0 values, such statistics fail to capture information about local dynamics in intonation that characterize an individual’s speaking style. In this work, we(More)
This paper compares three techniques for recognizing continu­ ous speech in the presence of additive car noise: 1) transforming the noisy acoustic features using a mapping algorithm, 2) adapta­ tion of the Hidden Markov Models (HMMs), and 3) combination of mapping and adaptation. We show that at low signal-to-noise ratio (SNR) levels, compensating in the(More)