Learn More
Ž . In this paper, we generalize relations between clean and noisy speech signal using vector Taylor series VTS expansion Ž . for noise-robust speech recognition. We use it for both the noisy data compensation and hidden Markov model HMM parameter adaptation, and apply it for the cepstral domain directly, while Moreno used it to estimate the log-spectral(More)
One of the key issues in practical speech processing is to achieve robust voice activity detection (VAD) against the background noise. Most of the statistical model-based approaches have tried to employ the Gaussian assumption in the discrete Fourier transform (DFT) domain, which, however, deviates from the real observation. In this paper, we propose a(More)
In this letter, we propose a novel approach to feature compensation for robust speech recognition in noisy environments. We employ the switching linear dynamic model (SLDM) as a parametric model for the clean speech distribution, which enables us to exploit temporal correlations inherent in speech signals. Both the background noise and clean speech(More)
In this letter, we propose a novel approach to voice activity detection (VAD) based on the modified maximum a posteriori (MAP) criterion conditioned on the voice activity decision made in the previous frame. To exploit the inter-frame correlation of voice activity, the probability of the voice presence conditioned on both the observed spectrum and the voice(More)
The voice activity detectors (VADs) based on statistical models have shown impressive performances especially when fairly precise statistical models are employed. Moreover, the accuracy of the VAD utilizing statistical models can be significantly improved when machine-learning techniques are adopted to provide prior knowledge for speech characteristics. In(More)
In this letter, we propose results of distribution tests that indicate that for many natural images, the statistics of the discrete cosine transform (DCT) coefficients are best approximated by a generalized gamma function (G/spl Gamma/F), which includes the conventional Gaussian, Laplacian, and gamma probability density functions. The major parameter of the(More)
In this letter, we propose a novel approach to human activity recognition. We present a class of features that are robust to the tilt of the attached sensor module and a state transition model suitable for HMMbased activity recognition. In addition, postprocessing techniques are applied to stabilize the recognition results. The proposed approach shows(More)
This paper proposes a voice activity detector (VAD) based on the complex Laplacian model. With the use of a goodness-of-fit (GOF) test, it is discovered that the Laplacian model is more suitable to describe noisy speech distribution than the conventional Gaussian model. The likelihood ratio (LR) based on the Laplacian model is computed and then applied to(More)