• Publications
  • Influence
Emotional space improves emotion recognition
TLDR
A new approach to emotion recognition is proposed, making use of two of the emotional dimensions and their relationship with different kinds of features, in a way that different classification methods can be applied for each specific case. Expand
Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information
TLDR
The correlation between signals coming from multiple microphones is analyzed and an improved method for carrying out speaker diarization for meetings with multiple distant microphones is proposed, improving the Diarization Error Rate (DER) by 15% to 20% relative to previous systems. Expand
Robust speaker diarization for meetings: ICSI RT06s evaluation system
TLDR
Four of the main improvements to the ICSI speaker diarization system submitted for the NIST Rich Transcription evaluation (RT06s) conducted on the meetings environment are introduced: a new training-free speech/non-speech detection algorithm, a new algorithm for system initialization, and a frame purification algorithm to increase clusters differentiability. Expand
Robust Speaker Diarization for meetings
TLDR
Four of the main improvements to the ICSI speaker diarization system submitted for the NIST Rich Transcription evaluation (RT06s) conducted on the meetings environment are introduced: a new training-free speech/non-speech detection algorithm, the introduction of a new algorithm for system initialization, and a frame purification algorithm to increase clusters differentiability. Expand
Confidence measures for spoken dialogue systems
TLDR
Improved confidence assessment for detection of word-level speech recognition errors, out of domain utterances and incorrect concepts in the CU Communicator system is provided and a neural network is considered to combine all features in each level. Expand
Speech to sign language translation system for Spanish
TLDR
The development of and the first experiments in a Spanish to sign language translation system in a real domain focusing on the sentences spoken by an official when assisting people applying for, or renewing their Identity Card are described. Expand
Classification of epileptic EEG recordings using signal transforms and convolutional neural networks
TLDR
This analysis was carried out using two public datasets (Bern-Barcelona EEG and Epileptic Seizure Recognition datasets) obtaining significant improvements in accuracy. Expand
Emotional speech synthesis: from speech database to TTS
TLDR
A through study of emotional speech in Spanish, and its application to TTS, and a prototype system that simulates emotional speech using a commercial synthesiser are presented. Expand
Speaker Diarization for Multi-microphone Meetings Using Only Between-Channel Differences
TLDR
A method to extract speaker turn segmentation from multiple distant microphones (MDM) using only delay values found via a cross-correlation between the available channels using only delays between channels processed and clustered to obtain a segmentation hypothesis. Expand
Speaker Diarization Features: The UPM Contribution to the RT09 Evaluation
TLDR
Two new features have been proposed and used in the Rich Transcription Evaluation 2009 by the Universidad Politécnica de Madrid, which outperform the results of the baseline system and are applied to the clustering stage of multiple distant microphone meetings diarization. Expand
...
1
2
3
4
5
...