Juan Andres Morales-Cordovilla

Learn More
This paper deals with the problem of searching for a suitable window for robust speech recognition in noisy conditions. A set of asymmetric windows, so-called DDR c,w , are proposed which are controlled by two parameters, center c and width w. These windows are derived from the DDR window used in the higher-lag autocorrelation spectrum estimation (HASE)(More)
—In this paper we propose two estimators for the autocorrelation sequence of a periodic signal in additive noise. Both estimators are formulated employing tables which contain all the possible products of sample pairs in a speech signal frame. The first estimator is based on a pitch-synchronous averaging. This estimator is statistically analyzed and we show(More)
This paper presents a noise estimation technique based on knowledge of pitch information for robust speech recognition. In the first stage the noise is estimated by means of extrapolating the noise from frames where speech is believed to be absent. These frames are detected with a proposed pitch based VAD (Voice Activity Detector). In the second stage the(More)
This paper provides a description of the preparation, the speakers, the recordings, and the creation of the orthographic transcriptions of the first large scale speech database for Austrian German. It contains approximately 1900 minutes of (read and spontaneous) speech produced by 38 speakers. The corpus consists of three components. First, the Conversation(More)
The problem of room localization is to determine where, in a multi-room environment, a person is producing a speech utterance. In our work, we are exploiting the information gained from a network of microphones installed all over a house, where the lack of calibration of the microphone energies creates an additional challenge. This paper compares room(More)
This paper addresses the problem of distant speech recognition in reverberant noisy conditions employing a microphone array. We present a prototype system that can segment the utterances in real-time and generate robust ASR results off-line. The segmentation is carried out by a voice activity detector based on deep belief networks , the speaker localization(More)
This paper presents the first large-scale analysis of pronunciation variation in conversational Austrian German. Whereas for the varieties of German spoken in Germany, conversational speech has been given noticeable attention in the fields of linguistics and automatic speech recognition, for conversational Austrian there is a lack in speech resources and(More)
This paper proposes a robust pitch extractor with application in Automatic Speech Recognition and based on selecting pitch lines of a tonegram (a representation of the different pitch energies at each frame time). First, the tonegram and its maximum energy regions are extracted and a Dynamic Time Warping algorithm finds the most energetic trajectories or(More)
For automatic speech recognition (ASR) systems it is important that the input signal mainly contains the desired speech signal. For a compact arrangement, differential microphone arrays (DMAs) are a suitable choice as front-end of ASR systems. The limiting factor of DMAs is the white noise gain, which can be treated by the minimum norm solution (MNS). In(More)