Share This Author
Input complexity and out-of-distribution detection with likelihood-based generative models
- J. Serrà, David Álvarez, V. Gómez, Olga Slizovskaia, José F. Núñez, J. Luque
- Computer ScienceICLR
- 25 September 2019
This paper uses an estimate of input complexity to derive an efficient and parameter-free OOD score, which can be seen as a likelihood-ratio, akin to Bayesian model comparison, and finds such score to perform comparably to, or even better than, existing OOD detection approaches under a wide range of data sets, models, model sizes, and complexity estimates.
Timbre analysis of music audio signals with convolutional neural networks
- Jordi Pons, Olga Slizovskaia, Rong Gong, E. Gómez, X. Serra
- Computer Science25th European Signal Processing Conference…
- 20 March 2017
One of the main goals of this work is to design efficient CNN architectures — what reduces the risk of these models to over-fit, since CNNs' number of parameters is minimized.
Vocoder-Based Speech Synthesis from Silent Videos
- Daniel Michelsanti, Olga Slizovskaia, G. Haro, Emilia G'omez, Z. Tan, J. Jensen
- Computer ScienceINTERSPEECH
- 6 April 2020
A way to synthesise speech from the silent video of a talker using deep learning, which exhibits an improvement over existing video-to-speech approaches.
End-to-end Sound Source Separation Conditioned on Instrument Labels
- Olga Slizovskaia, Leo Kim, G. Haro, E. Gómez
- Computer ScienceICASSP - IEEE International Conference on…
- 5 November 2018
This paper presents an extension of the Wave-U-Net model which allows end-to-end monaural source separation with a non-fixed number of sources and proposes multiplicative conditioning with instrument labels at the bottleneck of thewave-u-Net.
Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural Networks
- Eduardo Fonseca, Rong Gong, D. Bogdanov, Olga Slizovskaia, E. Gómez, X. Serra
- Computer ScienceDCASE
Comunicacio presentada al Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), celebrat el dia 16 de novembre de 2017 a Munic, Alemanya.
Automatic musical instrument recognition in audiovisual recordings by combining image and audio classification strategies
Comunicacio presentada a la 13th Sound and Music Computing Conference, celebrada el 31 d'agost de 2016 a Hamburg, Alemanya.
Locate This, Not That: Class-Conditioned Sound Event DOA Estimation
This paper proposes an alternative class-conditioned SELD model that performs better in terms of common SELD metrics than the baseline model that locates all classes simultaneously, and also outperforms specialist models that are trained to locate only a single class of interest.
Conditioned Source Separation for Musical Instrument Performances
This paper proposes a source separation method for multiple musical instruments sounding simultaneously and explores how much additional information apart from the audio stream can lift the quality of source separation.
Musical Instrument Recognition in User-generated Videos using a Multimodal Convolutional Neural Network Architecture
A Convolutional Neural Network architecture which combines learned representations from both modalities at a late fusion stage is developed which demonstrates state-of-the-art results in audio and video object recognition, provide additional robustness to missing modalities, and remains computationally cheap to train.
ACOUSTIC SCENE CLASSIFICATION BY FUSING LIGHTGBM AND VGG-NET MULTICHANNEL PREDICTIONS
This report provides a solution for the task 1 of DCASE 2017 challenge by building two parallel audio scene classification systems – LightGBM and VGG-net and performing a linear logistic regression method to fuse the systems.