• Publications
  • Influence
Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms
TLDR
The experiments show how deep architectures with sample-level filters improve the accuracy in music auto-tagging and they provide results comparable to previous state-of-the-art performances for the Magnatagatune dataset and Million Song Dataset.
Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging
TLDR
The experiments show that using the combination of multi-level and multi-scale features is highly effective in music auto-tagging and the proposed method outperforms the previous state-of-the-art methods on the MagnaTagATune dataset and the Million Song Dataset.
Sample-Level CNN Architectures for Music Auto-Tagging Using Raw Waveforms
TLDR
This paper improves the 1-D CNN architecture for music auto-tagging by adopting building blocks from state-of-the-art image classification models, ResNets and SENets, and adding multi-level feature aggregation to it, and comparing different combinations of the modules in building CNN architectures.
SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification
TLDR
A CNN architecture which learns representations using sample-level filters beyond typical frame-level input representations is proposed and extended using multi-level and multi-scale feature aggregation technique and subsequently conduct transfer learning for several music classification tasks.
Deep Content-User Embedding Model for Music Recommendation
TLDR
This work proposes deep content-user embedding model, a simple and intuitive architecture that combines the user-item interaction and music audio content and evaluates the model on music recommendation and music auto-tagging tasks.
Comparison and Analysis of SampleCNN Architectures for Audio Classification
TLDR
SampleCNN is scrutinized further by comparing it with spectrogram-based CNN and changing the subsampling operation in three different audio domains and shows that the excitation in the first layer is sensitive to the loudness, which is an acoustic characteristic that distinguishes different genres of music.
Disentangled Multidimensional Metric Learning for Music Similarity
TLDR
This work adapts a variant of deep metric learning called conditional similarity networks to the audio domain and extends it using track-based information to control the specificity of the model, and introduces the concept of multidimensional similarity.
Zero-shot Learning for Audio-based Music Classification and Tagging
TLDR
This work investigates the zero-shot learning in the music domain and organizes two different setups of side information using human-labeled attribute information based on Free Music Archive and OpenMIC-2018 datasets and general word semantic information from Million Song Dataset and this http URL tag annotations.
Combining Multi-Scale Features Using Sample-Level Deep Convolutional Neural Networks for Weakly Supervised Sound Event Detection
TLDR
This paper describes the method submitted to large-scale weakly supervised sound event detection for smart cars in the DCASE Challenge 2017, and shows that the waveform-based models can be comparable to spectrogrambased models when compared to other DCASE Task 4 submissions.
Deep Learning for Audio-Based Music Classification and Tagging: Teaching Computers to Distinguish Rock from Bach
TLDR
Over the last decade, music-streaming services have grown dramatically and giant technology companies such as Apple, Google, and Amazon have also been strengthening their music service platforms, providing listeners with a new and easily accessible way to listen to music.
...
...