Share This Author
Multimodal Deep Learning
This work presents a series of tasks for multimodal learning and shows how to train deep networks that learn features to address these tasks, and demonstrates cross modality feature learning, where better features for one modality can be learned if multiple modalities are present at feature learning time.
Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms
The experiments show how deep architectures with sample-level filters improve the accuracy in music auto-tagging and they provide results comparable to previous state-of-the-art performances for the Magnatagatune dataset and Million Song Dataset.
Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging
The experiments show that using the combination of multi-level and multi-scale features is highly effective in music auto-tagging and the proposed method outperforms the previous state-of-the-art methods on the MagnaTagATune dataset and the Million Song Dataset.
Melody Extraction on Vocal Segments Using Multi-Column Deep Neural Networks
A classification-based approach for melody extraction on vocal segments using multi-column deep neural networks trained to predict a pitch label of singing voice from spectrogram, but their outputs have different pitch resolutions.
Sample-Level CNN Architectures for Music Auto-Tagging Using Raw Waveforms
- Taejun Kim, Jongpil Lee, Juhan Nam
- Computer ScienceIEEE International Conference on Acoustics…
- 28 October 2017
This paper improves the 1-D CNN architecture for music auto-tagging by adopting building blocks from state-of-the-art image classification models, ResNets and SENets, and adding multi-level feature aggregation to it, and comparing different combinations of the modules in building CNN architectures.
SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification
A CNN architecture which learns representations using sample-level filters beyond typical frame-level input representations is proposed and extended using multi-level and multi-scale feature aggregation technique and subsequently conduct transfer learning for several music classification tasks.
Learning Sparse Feature Representations for Music Annotation and Retrieval
A systemic approach applying feature-learning algorithms to music data, in particular, focusing on a highdimensional sparse-feature representation is described, showing that, using only a linear classifier, the newly learned features produce results on the CAL500 dataset comparable to state-of-the-art music annotation and retrieval systems.
A Classification-Based Polyphonic Piano Transcription Approach Using Learned Feature Representations
This paper applies deep belief networks to musical data and evaluates the learned feature representations on classification-based polyphonic piano transcription and suggests a way of training classifiers jointly for multiple notes to improve training speed and classification performance.
Joint Detection and Classification of Singing Voice Melody Using Convolutional Recurrent Neural Networks
A joint detection and classification network that conducts the singing voice detection and the pitch estimation simultaneously and outperforms state-of-the-art algorithms on the datasets is presented.
Deep Content-User Embedding Model for Music Recommendation
This work proposes deep content-user embedding model, a simple and intuitive architecture that combines the user-item interaction and music audio content and evaluates the model on music recommendation and music auto-tagging tasks.