Multi-Representation Knowledge Distillation For Audio Classification

@article{Gao2022MultiRepresentationKD,
  title={Multi-Representation Knowledge Distillation For Audio Classification},
  author={Liang Gao and Kele Xu and Huaimin Wang and Yuxing Peng},
  journal={Multim. Tools Appl.},
  year={2022},
  volume={81},
  pages={5089-5112}
}
As an important component of multimedia analysis tasks, audio classification aims to discriminate between different audio signal types and has received intensive attention due to its wide applications. Generally speaking, the raw signal can be transformed into various representations (such as Short Time Fourier Transform and Mel Frequency Cepstral Coefficients), and information implied in different representations can be complementary. Ensembling the models trained on different representations… 
Investigating Multi-Feature Selection and Ensembling for Audio Classification
TLDR
An extensive evaluation of the performance of several cutting-edge DL models with various state-of-the-art audio features with a focus on feature selection suggests feature selection depends on both the dataset and the model.
Classification of audio signals using SVM-WOA in Hadoop map-reduce framework
TLDR
The proposed audio classification algorithm has contrasted with a few existing classification algorithms with demonstrating its productivity and the exactness, and utilized the MapReduce approach which is one of the sorts of big data investigation to play out the classification on the unstructured information.
DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs
TLDR
This work performs knowledge distillation from a large audio tagging system into an adversarial audio synthesizer that is called DarkGAN, and shows that DarkGAN can synthesize musical audio with acceptable quality and exhibits moderate attribute control even with out-of-distribution input conditioning.
Binaural Acoustic Scene Classification Using Wavelet Scattering, Parallel Ensemble Classifiers and Nonlinear Fusion
TLDR
This research presents a hybrid method that includes a novel mathematical fusion step which aims to tackle the challenges of ASC accuracy and adaptability of current state-of-the-art models.
Multimodal Deep Learning for Social Media Popularity Prediction With Attention Mechanism
TLDR
A novel multimodal deep learning framework for the popularity prediction task, which aims to leverage the complementary knowledge from different modalities, is proposed and results show that the proposed framework outperforms related approaches.
Teacher-Student Distillation for Heart Sound Classification
TLDR
It is found that the effectiveness of a distilled teacher-student CNN on binary heart sound classification is found to be able to roughly match performance of the teacher network while being half its size.
The Sustainable Development of Intangible Cultural Heritage with AI: Cantonese Opera Singing Genre Classification Based on CoGCNet Model in China
TLDR
A classification method based on the Cantonese opera Genre Classification Networks (CoGCNet) model, which has high classification accuracy, and the overall performance is better than that of the commonly used neural network models, provides a new feasible idea for the sustainable development of the study on the singing characteristics of the Cantonse opera genres.

References

SHOWING 1-10 OF 70 REFERENCES
SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification
TLDR
A CNN architecture which learns representations using sample-level filters beyond typical frame-level input representations is proposed and extended using multi-level and multi-scale feature aggregation technique and subsequently conduct transfer learning for several music classification tasks.
Meta learning based audio tagging
TLDR
This paper describes the solution for the general-purpose audio tagging task, which belongs to one of the subtasks in the DCASE 2018 challenge, and proposes a meta learning-based ensemble method that can provide higher prediction accuracy and robustness with comparison to the single model.
Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network
TLDR
This paper explores the use of Multi-channel CNN for the classification task, which aims to extract features from different channels in an end-to-end manner, and explores the using of mixup method, which can provide higher prediction accuracy and robustness in contrast with previous models.
Learning and Fusing Multimodal Deep Features for Acoustic Scene Categorization
TLDR
A novel acoustic scene classification system based on multimodal deep feature fusion is proposed, where three CNNs have been presented to perform 1D raw waveform modeling, 2D time-frequency image modeling, and 3D spatial-temporal dynamics modeling, respectively.
Sample Dropout for Audio Scene Classification Using Multi-Scale Dense Connected Convolutional Neural Network
TLDR
Inspired by the silence removal in the speech signal processing, a novel sample dropout approach is proposed, which aims to remove outliers in the training dataset, and can further improve the classification robustness of multi-scale DenseNet.
DCAR: A Discriminative and Compact Audio Representation for Audio Processing
TLDR
Variants on the proposed DCAR representation consistently outperform four popular audio representations and discuss how these performance differences across tasks follow from how each type of model leverages (or does not leverage) the intrinsic structure of the data.
Sample Mixed-Based Data Augmentation for Domestic Audio Tagging
TLDR
A convolutional recurrent neural network with attention module with log-scaled mel spectrum as a baseline system is applied to audio tagging, achieving an state-of-the-art of equal error rate (EER) of 0.10 on DCASE 2016 task4 dataset with mixup approach, outperforming the baseline system without data augmentation.
Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge
TLDR
The emergence of deep learning as the most popular classification method is observed, replacing the traditional approaches based on Gaussian mixture models and support vector machines.
Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features
TLDR
Results demonstrate that the proposed method is highly effective in the classification tasks by employing multi-temporal resolution and multi-level features, and it outperforms the previous methods which only account for single- level features.
Classification of audio signals using AANN and GMM
...
...