Learn More
We show that the standard hypothesis scoring paradigm used in maximum-likelihood-based speech recognition systems is not optimal with regard to minimizing the word error rate, the commonly used performance metric in speech recognition. This can lead to sub-optimal performance , especially in high-error-rate environments where word error and sentence error(More)
We present a paradigm for the automatic assessment of pronunciation quality by machine. In this scoring paradigm, both native and nonnative speech data is collected, and a database of human-expert ratings is created to enable the development of a variety of machine scores. We rst discuss issues related to the design of speech databases, and the reliability(More)
SRI International is currently involved in the development of a new generation of software systems for automatic scoring of pronunciation as part of the Voice Interactive Language Training System (VILTS) project. This paper describes the goals of the VILTS system, the speech corpus, and the algorithm development. The automatic grading system uses SRI's(More)
— This paper addresses the issue of closed-set text-independent speaker identification from samples of speech recorded over the telephone. It focuses on the effects of acoustic mismatches between training and testing data, and concentrates on two approaches: 1) extracting features that are robust against channel variations and 2) transforming the speaker(More)
R ´ ESUMÉ Cet article propose une méthode basée sur l'analyse dis-criminative non-linéaire pour extraire et sélectionner un ensemble de vecteurs acoustiques utilisés pour l'identi-fication de locuteurs. L'approche consistè a mesurer et grouper un grand nombre de mesures acoustiques (corre-spondantà plusieurs trames de données consécutives), età réduire la(More)
This paper studies the eects of handset distortion on telephone based speaker recognition performance, resulting in the following observations: (1) the major factor in speaker recognition errors is whether the handset type (e.g., electret, carbon) is dierent across training and testing, not whether the telephone lines are mismatched, (2) the distribution of(More)
This paper proposes a probabilistic framework to deene and evaluate conndence measures for word recognition. We describe a novel method to combine diierent knowledge sources and estimate the conndence in a word hypothesis, via a neural network. We also propose a measure of the joint performance of the recognition and conndence systems. The deenitions and(More)
Statistics of frame-level pitch have recently been used in speaker recognition systems with good results [1, 2, 3]. Although they convey useful long-term information about a speaker's distribution of f 0 values, such statistics fail to capture information about local dynamics in intonation that characterize an individual's speaking style. In this work, we(More)