• Publications
  • Influence
SEGAN: Speech Enhancement Generative Adversarial Network
TLDR
This work proposes the use of generative adversarial networks for speech enhancement, and operates at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.
Overcoming catastrophic forgetting with hard attention to the task
TLDR
A task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning, and features the possibility to control both the stability and compactness of the learned knowledge, which makes it also attractive for online learning or network compression applications.
Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification
We present a new technique for audio signal comparison based on tonal subsequence alignment and its application to detect cover versions (i.e., different performances of the same underlying musical
Input complexity and out-of-distribution detection with likelihood-based generative models
TLDR
This paper uses an estimate of input complexity to derive an efficient and parameter-free OOD score, which can be seen as a likelihood-ratio, akin to Bayesian model comparison, and finds such score to perform comparably to, or even better than, existing OOD detection approaches under a wide range of data sets, models, model sizes, and complexity estimates.
Cross recurrence quantification for cover song identification
TLDR
A recurrence quantification analysis measure is proposed that allows the tracking of potentially curved and disrupted traces in cross recurrence plots (CRPs) and it is shown that this method identifies cover songs with a higher accuracy as compared to previously published techniques.
Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks
TLDR
Experiments show that the proposed improved self-supervised method can learn transferable, robust, and problem-agnostic features that carry on relevant information from the speech signal, such as speaker identity, phonemes, and even higher-level features such as emotional cues.
Music Mood Representations from Social Tags
TLDR
This study demonstrates a particular relevancy of the basic emotions model with four mood clusters that can be sum-marized as: happy, sad, angry and tender.
Unsupervised Detection of Music Boundaries by Time Series Structure Features
TLDR
This paper proposes an unsupervised method for boundary detection, combining three basic principles: novelty, homogeneity, and repetition, which is applicable to a wide range of time series beyond the music and audio domains.
Measuring the Evolution of Contemporary Western Popular Music
TLDR
A number of patterns and metrics characterizing the generic usage of primary musical facets such as pitch, timbre, and loudness in contemporary western popular music, consistently stable for a period of more than fifty years are unveiled.
...
...