Learn More
• Where represents the power spectrum of the degraded speech, is the power spectrum of the clean speech, is the transfer function of the linear filter, and is the power spectrum of the additive noise.) (|) (|) () Z(2 ω ω ω ω N H X + =) Z(ω) X(ω) (ω H) (ω N • In the log-Spectral domain this relation can be expressed as:) 1 log(q x n e q x z − − + + + = of in(More)
Speech recognition systems perform poorly in the presence of corrupting noise. Missing feature methods attempt to compensate for the noise by removing noise corrupted components of spectrographic representations of noisy speech and performing recognition with the remaining reliable components. Conventional classifier-compensation methods modify the(More)
Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this(More)
In this paper we describe a model developed for the analysis of acoustic spectra. Unlike decom-positions techniques that can result in difficult to interpret results this model explicitly models spectra as distributions and extracts sets of additive and semantically useful components that facilitate a variety of applications ranging from source separation,(More)
In this paper we describe a technique that allows the extraction of multiple local shift-invariant features from analysis of non-negative data of arbitrary dimensionality. Our approach employs a probabilistic latent variable model with sparsity constraints. We demonstrate its utility by performing feature extraction in a variety of domains ranging from(More)
Most state-of-the-art action feature extractors involve differential operators, which act as highpass filters and tend to attenuate low frequency action information. This attenuation introduces bias to the resulting features and generates ill-conditioned feature matrices. The Gaussian Pyramid has been used as a feature enhancing technique that encodes(More)
We present an algorithm for dereverberation of speech signals for automatic speech recognition (ASR) applications. Often ASR systems are presented with speech that has been recorded in environments that include noise and reverberation. The performance of ASR systems degrades with increasing levels of noise and reverberation. While many algorithms have been(More)
In this paper we describe a methodology for model-based single channel separation of sounds. We present a sparse latent variable model that can learn sounds based on their distribution of time/frequency energy. This model can then be used to extract known types of sounds from mixtures in two scenarios. One being the case where all sound types in the mixture(More)
In this article we have reviewed a wide variety of techniques based on the identification of missing spectral features that have proved effective in reducing the error rates of automatic speech recognition systems. These approaches have been conspicuously effective in ameliorating the effects of transient maskers such as impulsive noise or background music.(More)
We present a technique for denoising speech using nonnegative matrix factorization (NMF) in combination with statistical speech and noise models. We compare our new technique to standard NMF and to a state-of-the-art Wiener filter implementation and show improvements in speech quality across a range of interfering noise types.