Learn More
The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise(More)
This paper presents the derivation of a new perceptual model that represents speech and audio signals by a sum of exponentially damped sinusoids. Compared to a traditional sinusoidal model, the exponential sinusoidal model (ESM) is better suited to model transient segments that are readily found in audio signals. Total least squares (TLS) algorithms are(More)
We present a new algorithm for the estimation of the voicing cut-off frequency (VCO), i.e., the frequency that separates the harmonic low-frequency part from the aperiodic high-frequency part in voiced speech. The VCO is estimated as the frequency for which the sum of the harmonicity scores of all pitch harmonics below that frequency is maximized. The(More)
While a traditional sinusoidal model is capable of representing audio segments, a sum of exponentially damped sinusoids is more efficient to model the transient segments that are readily found in audio signals. In this paper, Total Least Squares (TLS) algorithms are applied to automatically extract the modeling parameters in the Exponential Sinusoidal Model(More)
This paper presents a new approach to improve the robustness of large vocabulary continuous speech recognition. The proposed technique { based on Singular Value Decomposition (SVD) { originates from classical signal enhancement, but it is adapted to the speci c requirements imposed by the speech recognition process. Additive noise reduction is obtained by(More)
We present a new algorithm for the automatic estimation of the voicing cut-off frequency (VCO), i.e., the frequency that separates the periodic low-frequency part from the aperiodic high-frequency part in voiced segments of natural speech. Starting from the power spectrum of a two pitch period speech frame, we define the VCO to be located at the frequency(More)
We describe how Total Least Squares (TLS) algorithms can be applied as a powerful and eÆcient modelling tool for wideband speech. A detailed description in both time domain and frequency domain illustrates how the modelling functions { damped sinusoids { naturally synthesise non-stationary signals. Straightforward implementations of TLS applied to fullband(More)
Signal Subspace (SS) based speech enhancement techniques obtain signi cant additive-noise reduction by altering the singular value spectrum of the speech observation matrix. Among the class of di erent possible SS weighting strategies, the Minimum Variance (MV) estimation method substantially increases the speech recognition accuracy in additive noise(More)
We present a harmonic-plus-noise modelling (HNM) strategy in the context of corpus-based text-to-speech (TTS) synthesis, in which whole speech phonemes are modelled in their integrity, contrary to the traditional frame-based approach. The pitch and amplitude trajectories of each phoneme are modelled with a low-order DCT expansion. The parameter analysis(More)