Learn More
One of the key issues in practical speech processing is to achieve robust voice activity detection (VAD) against the background noise. Most of the statistical model-based approaches have tried to employ the Gaussian assumption in the discrete Fourier transform (DFT) domain, which, however, deviates from the real observation. In this paper, we propose a(More)
Ž. In this paper, we generalize relations between clean and noisy speech signal using vector Taylor series VTS expansion Ž. for noise-robust speech recognition. We use it for both the noisy data compensation and hidden Markov model HMM parameter adaptation, and apply it for the cepstral domain directly, while Moreno used it to estimate the log-spectral(More)
In this letter, we propose a novel approach to voice activity detection (VAD) based on the modified maximum a posteriori (MAP) criterion conditioned on the voice activity decision made in the previous frame. To exploit the inter-frame correlation of voice activity, the probability of the voice presence conditioned on both the observed spectrum and the voice(More)
We propose a voice activity detection (VAD) algorithm based on the generalized gamma distribution (GΓD). The distributions of noise spectra and noisy speech spectra including speech-inactive intervals are modeled by a set of GΓD's and applied to the likelihood ratio test (LRT) for VAD. The parameters of GΓD are estimated through an on-line maximum(More)
In this letter, we propose results of distribution tests that indicate that for many natural images, the statistics of the discrete cosine transform (DCT) coefficients are best approximated by a generalized gamma function (G/spl Gamma/F), which includes the conventional Gaussian, Laplacian, and gamma probability density functions. The major parameter of the(More)
The voice activity detectors (VADs) based on statistical models have shown impressive performances especially when fairly precise statistical models are employed. Moreover, the accuracy of the VAD utilizing statistical models can be significantly improved when machine-learning techniques are adopted to provide prior knowledge for speech characteristics. In(More)
SUMMARY In this letter, we propose a novel approach to human activity recognition. We present a class of features that are robust to the tilt of the attached sensor module and a state transition model suitable for HMM-based activity recognition. In addition, postprocessing techniques are applied to stabilize the recognition results. The proposed approach(More)
In this paper, we propose a novel approach to automatic speech segmentation for unit-selection based text-to-speech systems. Instead of using a single automatic segmentation machine (ASM), we make use of multiple independent ASMs to produce a final boundary time-mark. Specifically, given multiple boundary time-marks provided by separate ASMs, we first(More)