• Publications
  • Influence
CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS
TLDR
This report describes the 4 submissions for Task 1 (Audio scene classification) of the DCASE-2016 challenge of the CP-JKU team and proposes a novel i-vector extraction scheme for ASC using both left and right audio channels and a Deep Convolutional Neural Network architecture trained on spectrograms of audio excerpts in end-to-end fashion.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
TLDR
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
I-Vectors for Timbre-Based Music Similarity and Music Artist Classification
TLDR
A novel approach to extract songlevel descriptors built from frame-level timbral features such as Mel-frequency cepstral coefficient (MFCC) and identity vectors or i-vectors, which are the results of a factor analysis procedure applied on framelevel features.
The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification
TLDR
The receptive field (RF) of CNNs is analysed and the importance of the RF to the generalization capability of the models is demonstrated, showing that very small or very large RFs can cause performance degradation, but deep models can be made to generalize well by carefully choosing an appropriate RF size within a certain range.
A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification
TLDR
A novel multi-channel i-vector extraction and scoring scheme for ASC and a CNN architecture that achieves promising ASC results are proposed, and it is shown that i-vectors and CNNs capture complementary information from acoustic scenes.
Mixture Density Generative Adversarial Networks
TLDR
The ability to avoid mode collapse and discover all the modes and superior quality of the generated images (as measured by the Fréchet Inception Distance) are demonstrated, achieving the lowest FID compared to all baselines.
Emotion and Theme Recognition in Music with Frequency-Aware RF-Regularized CNNs
TLDR
It is observed that ResNets with smaller receptive fields -- originally adapted for acoustic scene classification -- also perform well in the emotion tagging task, and improves the performance of such architectures using techniques such as Frequency Awareness and Shake-Shake regularization.
Movie genome: alleviating new item cold start in movie recommendation
TLDR
A new movie recommender system that addresses the new item problem in the movie domain by integrating state-of-the-art audio and visual descriptors and proposing a two-step hybrid approach which trains a CF model on warm items and leverages the learned model on the movie genome to recommend cold items.
Audio-visual encoding of multimedia content for enhancing movie recommendations
TLDR
A multi-modal content-based movie recommender system that replaces human-generated metadata with content descriptions automatically extracted from the visual and audio channels of a video, which sheds light on the accuracy and beyond-accuracy performance of audio, visual, and textual features in content- basedmovie recommender systems.
Feature-combination hybrid recommender systems for automated music playlist continuation
TLDR
The results of the experiments indicate that the introduced feature-combination hybrid recommender systems can more accurately predict fitting playlist continuations as a result of their improved representation of songs occurring in few playlists.
...
...