Learn More
A 2D DCT-based approach to compressing acoustic features for remote speech recognition applications is presented. The coding scheme involves computing a 2D DCT on blocks of feature vectors followed by uniform scalar quantization, run-length and Huffman coding. Digit recognition experiments were conducted in which training was done with unquantized cepstral(More)
— This paper describes two mechanisms that augment the common automatic speech recognition (ASR) front end and provide adaptation and isolation of local spectral peaks. A dynamic model consisting of a linear filterbank with a novel additive logarithmic adaptation stage after each filter output is proposed. An extensive series of perceptual forward masking(More)
—In this paper, we present a framework for developing source coding, channel coding and decoding as well as erasure con-cealment techniques adapted for distributed (wireless or packet-based) speech recognition. It is shown that speech recognition as opposed to speech coding, is more sensitive to channel errors than channel erasures, and appropriate channel(More)
This paper focuses on the problem of pitch tracking in noisy conditions. A method using harmonic information in the residual signal is presented. The proposed criterion is used both for pitch estimation, as well as for determining the voicing segments of speech. In the experiments, the method is compared to six state-of-the-art pitch trackers on the Keele(More)
This study examines relationships between external face movements, tongue movements, and speech acoustics for consonant-vowel (CV) syllables and sentences spoken by two male and two female talkers with different visual intelligibility ratings. The questions addressed are how relationships among measures vary by syllable, whether talkers who are more(More)
Vocal Tract Length Normalization (VTLN) for standard filterbank-based Mel Frequency Cepstral Coefficient (MFCC) features is usually implemented by warping the center frequencies of the Mel filterbank, and the warping factor is estimated using the maximum likelihood score (MLS) criterion (Lee and Rose, 1998). A linear transform (LT) equivalent for frequency(More)
While vocal tract resonances (VTRs, or formants that are defined as such resonances) are known to play a critical role in human speech perception and in computer speech processing, there has been a lack of standard databases needed for the quantitative evaluation of automatic VTR extraction techniques. We report in this paper on our recent effort to create(More)