• Publications
  • Influence
SEGAN: Speech Enhancement Generative Adversarial Network
TLDR
This work proposes the use of generative adversarial networks for speech enhancement, and operates at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.
Speech emotion recognition using hidden Markov models
This paper introduces a first approach to emotion recognition using RAMSES, the UPC’s speech recognition system. The approach is based on standard speech recognition technology using hidden
Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks
TLDR
Experiments show that the proposed improved self-supervised method can learn transferable, robust, and problem-agnostic features that carry on relevant information from the speech signal, such as speaker identity, phonemes, and even higher-level features such as emotional cues.
Voice Conversion Based on Weighted Frequency Warping
TLDR
Compared to standard probabilistic systems, Weighted Frequency Warping results in a significant increase in quality scores, whereas the conversion scores remain almost unaltered.
Albayzin speech database: design of the phonetic corpus
TLDR
The phonetic content of Albayzin, a spoken database for Spanish designed for speech recognition purposes, and the phonetic and statistical criteria for the final constitution of the database are discussed.
INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora
TLDR
This paper proposes a new iterative alignment method that allows pairing phonetically equivalent acoustic vectors from nonparallel utterances from different speakers, even under cross-lingual conditions, and it does not require any phonetic or linguistic information.
Text-Independent Voice Conversion Based on Unit Selection
TLDR
A new approach is presented that applies unit selection to find corresponding time frames in source and target speech to achieve the same performance as the conventional text-dependent training.
Analysis of prosodic features towards modelling of emotional and pragmatic attributes of speech
TLDR
Los resultados preliminares muestran that podemos identicar claramente las emociones y tambi en that hay una correlaci on signicativ a entre prosodia y atributos pragm aticos.
...
...