Multi-Representation Knowledge Distillation For Audio Classification

  title={Multi-Representation Knowledge Distillation For Audio Classification},
  author={Liang Gao and Kele Xu and Huaimin Wang and Yuxing Peng},
  journal={Multim. Tools Appl.},
As an important component of multimedia analysis tasks, audio classification aims to discriminate between different audio signal types and has received intensive attention due to its wide applications. Generally speaking, the raw signal can be transformed into various representations (such as Short Time Fourier Transform and Mel Frequency Cepstral Coefficients), and information implied in different representations can be complementary. Ensembling the models trained on different representations… 
DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs
This work performs knowledge distillation from a large audio tagging system into an adversarial audio synthesizer that is called DarkGAN, and shows that DarkGAN can synthesize musical audio with acceptable quality and exhibits moderate attribute control even with out-of-distribution input conditioning.
Binaural Acoustic Scene Classification Using Wavelet Scattering, Parallel Ensemble Classifiers and Nonlinear Fusion
This research presents a hybrid method that includes a novel mathematical fusion step which aims to tackle the challenges of ASC accuracy and adaptability of current state-of-the-art models.
Multimodal Deep Learning for Social Media Popularity Prediction With Attention Mechanism
A novel multimodal deep learning framework for the popularity prediction task, which aims to leverage the complementary knowledge from different modalities, is proposed and results show that the proposed framework outperforms related approaches.
Teacher-Student Distillation for Heart Sound Classification
It is found that the effectiveness of a distilled teacher-student CNN on binary heart sound classification is found to be able to roughly match performance of the teacher network while being half its size.
The Sustainable Development of Intangible Cultural Heritage with AI: Cantonese Opera Singing Genre Classification Based on CoGCNet Model in China
A classification method based on the Cantonese opera Genre Classification Networks (CoGCNet) model, which has high classification accuracy, and the overall performance is better than that of the commonly used neural network models, provides a new feasible idea for the sustainable development of the study on the singing characteristics of the Cantonse opera genres.


SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification
A CNN architecture which learns representations using sample-level filters beyond typical frame-level input representations is proposed and extended using multi-level and multi-scale feature aggregation technique and subsequently conduct transfer learning for several music classification tasks.
Meta learning based audio tagging
This paper describes the solution for the general-purpose audio tagging task, which belongs to one of the subtasks in the DCASE 2018 challenge, and proposes a meta learning-based ensemble method that can provide higher prediction accuracy and robustness with comparison to the single model.
Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network
This paper explores the use of Multi-channel CNN for the classification task, which aims to extract features from different channels in an end-to-end manner, and explores the using of mixup method, which can provide higher prediction accuracy and robustness in contrast with previous models.
Learning and Fusing Multimodal Deep Features for Acoustic Scene Categorization
A novel acoustic scene classification system based on multimodal deep feature fusion is proposed, where three CNNs have been presented to perform 1D raw waveform modeling, 2D time-frequency image modeling, and 3D spatial-temporal dynamics modeling, respectively.
Sample Dropout for Audio Scene Classification Using Multi-Scale Dense Connected Convolutional Neural Network
Inspired by the silence removal in the speech signal processing, a novel sample dropout approach is proposed, which aims to remove outliers in the training dataset, and can further improve the classification robustness of multi-scale DenseNet.
DCAR: A Discriminative and Compact Audio Representation for Audio Processing
Variants on the proposed DCAR representation consistently outperform four popular audio representations and discuss how these performance differences across tasks follow from how each type of model leverages (or does not leverage) the intrinsic structure of the data.
Sample Mixed-Based Data Augmentation for Domestic Audio Tagging
A convolutional recurrent neural network with attention module with log-scaled mel spectrum as a baseline system is applied to audio tagging, achieving an state-of-the-art of equal error rate (EER) of 0.10 on DCASE 2016 task4 dataset with mixup approach, outperforming the baseline system without data augmentation.
Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge
The emergence of deep learning as the most popular classification method is observed, replacing the traditional approaches based on Gaussian mixture models and support vector machines.
Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features
Results demonstrate that the proposed method is highly effective in the classification tasks by employing multi-temporal resolution and multi-level features, and it outperforms the previous methods which only account for single- level features.
Classification of audio signals using AANN and GMM