Learn More
This paper focuses on the problem of pitch tracking in noisy conditions. A method using harmonic information in the residual signal is presented. The proposed criterion is used both for pitch estimation, as well as for determining the voicing segments of speech. In the experiments, the method is compared to six state-of-the-art pitch trackers on the Keele(More)
Acoustic feedback is a problem in hearing aids that contain a substantial amount of gain, hearing aids that are used in conjunction with vented or open molds, and in-the-ear hearing aids. Acoustic feedback is both annoying and reduces the maximum usable gain of hearing-aid devices. This paper studies analytically the steady-state convergence behavior of(More)
This paper describes two mechanisms that augment the common automatic speech recognition (ASR) front end and provide adaptation and isolation of local spectral peaks. A dynamic model consisting of a linear filterbank with a novel additive logarithmic adaptation stage after each filter output is proposed. An extensive series of perceptual forward masking(More)
While vocal tract resonances (VTRs, or formants that are defined as such resonances) are known to play a critical role in human speech perception and in computer speech processing, there has been a lack of standard databases needed for the quantitative evaluation of automatic VTR extraction techniques. We report in this paper on our recent effort to create(More)
Magnetic resonance images of the vocal tract during sustained production of the fricatives/s, •, f, 0, z, 3, v, 6/by four subjects are analyzed. Measurements ofvocal-tract lengths and area functions, and morphological analyses of the vocal tract and tongue shapes for these sounds are presented. Interspeaker differences in area functions are found to be(More)
A novel Statistical Algorithm for F0 Estimation (SAFE) is proposed to improve the accuracy of F0 estimation under both clean and noisy conditions. Prominent signal-to-noise ratio (SNR) peaks in speech spectra constitute a robust information source from which F0 can be inferred. A probabilistic framework is proposed to model the effect of noise on voiced(More)
Vocal Tract Length Normalization (VTLN) for standard filterbank-based Mel Frequency Cepstral Coefficient (MFCC) features is usually implemented by warping the center frequencies of the Mel filterbank, and the warping factor is estimated using the maximum likelihood score (MLS) criterion (Lee and Rose, 1998). A linear transform (LT) equivalent for frequency(More)
Most speech processing algorithms analyze speech signals frame by frame with a fixed frame rate. Fixed-rate analysis is inconsistent with human speech perception and effectively assigns the same importance or 'weight' to all equi-duration frames. In Zhu et al. (2000), we proposed a variable frame rate (VFR) analysis technique that is based on a Euclidian(More)
In this paper, we present a framework for developing source coding, channel coding and decoding as well as erasure concealment techniques adapted for distributed (wireless or packetbased) speech recognition. It is shown that speech recognition as opposed to speech coding, is more sensitive to channel errors than channel erasures, and appropriate channel(More)