Jong Won Shin

Learn More
In this letter, we propose a novel approach to voice activity detection (VAD) based on the modified maximum a posteriori (MAP) criterion conditioned on the voice activity decision made in the previous frame. To exploit the inter-frame correlation of voice activity, the probability of the voice presence conditioned on both the observed spectrum and the voice(More)
The voice activity detectors (VADs) based on statistical models have shown impressive performances especially when fairly precise statistical models are employed. Moreover, the accuracy of the VAD utilizing statistical models can be significantly improved when machine-learning techniques are adopted to provide prior knowledge for speech characteristics. In(More)
In this letter, we propose results of distribution tests that indicate that for many natural images, the statistics of the discrete cosine transform (DCT) coefficients are best approximated by a generalized gamma function (G/spl Gamma/F), which includes the conventional Gaussian, Laplacian, and gamma probability density functions. The major parameter of the(More)
This paper proposes a voice activity detector (VAD) based on the complex Laplacian model. With the use of a goodness-of-fit (GOF) test, it is discovered that the Laplacian model is more suitable to describe noisy speech distribution than the conventional Gaussian model. The likelihood ratio (LR) based on the Laplacian model is computed and then applied to(More)
We propose a voice activity detection (VAD) algorithm based on the generalized gamma distribution (GΓD). The distributions of noise spectra and noisy speech spectra including speech-inactive intervals are modeled by a set of GΓD’s and applied to the likelihood ratio test (LRT) for VAD. The parameters of GΓD are estimated through an on-line maximum(More)
Non-negative matrix factorization (NMF) is one of the most well-known techniques that are applied to separate a desired source from mixture data. In the NMF framework, a collection of data is factorized into a basis matrix and an encoding matrix. The basis matrix for mixture data is usually constructed by augmenting the basis matrices for independent(More)
In this letter, generalized gamma distribution (GCD) is introduced as a new statistical model of spectral distribution to be applied to the likelihood ratio test performed in voice activity detection (VAD). A gradient-based on-line algorithm is proposed to estimate the parameters of GCD according to the maximum likelihood criterion. Experimental results(More)
In this paper, we propose a novel approach to speech enhancement, which incorporates a new criterion based on residual noise shaping. In the proposed approach, our goal is to make the residual noise perceptually comfortable although the power of the residual noise is relatively high. In contrast to the conventional techniques, the proposed approach(More)
In the presence of background noise, the perceptual loudness of speech signal significantly decreases, resulting in the deterioration of intelligibility and clarity. In this letter, we propose a novel approach to enhance the perceived quality of the speech signal when the additive noise cannot be directly controlled. Instead of controlling the background(More)
This letter presents a speech enhancement technique combining statistical models and non-negative matrix factorization (NMF) with on-line update of speech and noise bases. The statistical model-based enhancement methods have been known to be less effective to non-stationary noises while the template-based enhancement techniques can deal with them quite(More)